Check if number is in multiple strings and label - arrays

Here is the output I am working with:
Pool Name: Pool 2
Pool ID: 1
LUNs: 1015, 1080, 1034, 1016, 500, 1002, 1062, 1041, 1046, 1028, 1009, 1054, 513, 1058, 1070, 515, 1049, 1083, 1020, 1076, 19, 509, 1057, 1021, 525, 1019, 518, 1075, 29, 23, 1068, 37, 1064, 506, 1024, 1026, 1008, 1087, 1012, 1006, 1018, 502, 1004, 1074, 1030, 1032, 39, 1014, 1005, 1056, 1044, 2, 1033, 1001, 16, 1061, 1040, 1045, 1027, 26, 1023, 1053, 1037, 1079, 512, 520, 1069, 1039, 514, 1048, 1082, 523, 508, 524, 517, 522, 1066, 1089, 1067, 529, 528, 1063, 505, 1081, 527, 1007, 1086, 1051, 1011, 1035, 1017, 501, 1003, 1042, 1073, 1085, 1029, 1010, 24, 1013, 1055, 1043, 1059, 52, 1071, 516, 1050, 1084, 1000, 1077, 1060, 1072, 510, 1022, 1052, 526, 1036, 1078, 511, 35, 519, 1038, 521, 1047, 507, 6, 1065, 1025, 1088, 503, 53, 1031, 504
Pool Name: Pool 1
Pool ID: 0
LUNs: 9, 3, 34, 10, 12, 8, 7, 0, 38, 27, 18, 4, 42, 21, 17, 28, 36, 22, 13, 5, 11, 25, 15, 32, 1
Pool Name: Pool 4
Pool ID: 2
LUNs: (this one is empty)
What I would like to do is store each one of the "LUNs:" into their own variables (array?). Then take my number and search for it in all arrays, in this example there are three. If it matches my number for example "34" the program will output Your number is in Pool 1
I know how to pull the LUN lines I need with Regex expressions and I know how to compare the results with an if statement but get lost combining the two and even more lost when thinking about outputting the correct "Pool Name".
EDIT
I should add the total number of pools can change as well as the LUN number lists.

Convert the output into a single string, replace colons with equals signs and split the string at double line breaks, then convert the fragments into objects using ConvertFrom-StringData and New-Object and split the LUN string into an array:
$data = ... | Out-String
$pools = $data -replace ': +','=' -split "`r`n`r`n" |
% { New-Object -Type PSCustomObject -Property (ConvertFrom-StringData $_) } |
select -Property *,#{n='LUNs';e={$_.LUNs -split ', '}} -Exclude LUNs
With that you can get the pool name of a pool containing a given LUN like this:
$pools | ? { $_.LUNs -contains 34 } | select -Expand 'Pool Name'

I'm sure there's an easier way...
Is that what you need?
$Number = 42
$Lun1=1015, 1080, 1034, 1016, 500, 1002, 1062, 1041, 1046, 1028, 1009, 1054, 513, 1058, 1070
$Lun2=9, 3, 34, 10, 12, 8, 7, 0, 38, 27, 18, 4, 42, 21, 17, 28, 36, 22, 13, 5, 11, 25, 15, 32
$Lun3=$null
$Lun1Length=$Lun1.Length
$Lun2Length=$Lun2.Length
$Lun3Length=$Lun3.Length
[Array]$Luns = $Lun1, $Lun2, $Lun3
foreach ($Lun in $Luns)
{
if ($Lun -contains $Number)
{
Switch ($Lun.Length)
{
$Lun1Length {"$Number in Lun1"}
$Lun2Length {"$Number in Lun2"}
$Lun3Length {"$Number in Lun3"}
}
}
}
42 in Lun2

Related

Pearson hash 8-bit implementation is producing very non-uniform values

I am implementing a pearson hash in order to create a lightweight dictionary structure for a C project which requires a table of files names paired with file data - I want the nice constant search property of hash tables. I'm no math expert so I looked up good text hashes and pearson came up, with it being claimed to be effective and having a good distribution. I tested my implementation and found that no matter how I vary the table size or the filename max length, the hash is very inefficient, with for example 18/50 buckets being left empty. I trust wikipedia to not be lying, and yes I am aware I can just download a third party hash table implementation, but I would dearly like to know why my version isn't working.
In the following code, (a function to insert values into the table), "csString" is the filename, the string to be hashed, "cLen" is the length of the string, "pData" is a pointer to some data which is inserted into the table, and "pTable" is the table struct. The initial condition cHash = cLen - csString[0] is somethin I experimentally found to marginally improve uniformity. I should add that I am testing the table with entirely randomised strings (using rand() to generate ascii values) with randomised length between a certain range - this is in order to easily generate and test large amounts of values.
typedef struct StaticStrTable {
unsigned int nRepeats;
unsigned char nBuckets;
unsigned char nMaxCollisions;
void** pBuckets;
} StaticStrTable;
static const char cPerm256[256] = {
227, 117, 238, 33, 25, 165, 107, 226, 132, 88, 84, 68, 217, 237, 228, 58, 52, 147, 46, 197, 191, 119, 211, 0, 218, 139, 196, 153, 170, 77, 175, 22, 193, 83, 66, 182, 151, 99, 11, 144, 104, 233, 166, 34, 177, 14, 194, 51, 30, 121, 102, 49,
222, 210, 199, 122, 235, 72, 13, 156, 38, 145, 137, 78, 65, 176, 94, 163, 95, 59, 92, 114, 243, 204, 224, 43, 185, 168, 244, 203, 28, 124, 248, 105, 10, 87, 115, 161, 138, 223, 108, 192, 6, 186, 101, 16, 39, 134, 123, 200, 190, 195, 178,
164, 9, 251, 245, 73, 162, 71, 7, 239, 62, 69, 209, 159, 3, 45, 247, 19, 174, 149, 61, 57, 146, 234, 189, 15, 202, 89, 111, 207, 31, 127, 215, 198, 231, 4, 181, 154, 64, 125, 24, 93, 152, 37, 116, 160, 113, 169, 255, 44, 36, 70, 225, 79,
250, 12, 229, 230, 76, 167, 118, 232, 142, 212, 98, 82, 252, 130, 23, 29, 236, 86, 240, 32, 90, 67, 126, 8, 133, 85, 20, 63, 47, 150, 135, 100, 103, 173, 184, 48, 143, 42, 54, 129, 242, 18, 187, 106, 254, 53, 120, 205, 155, 216, 219, 172,
21, 253, 5, 221, 40, 27, 2, 179, 74, 17, 55, 183, 56, 50, 110, 201, 109, 249, 128, 112, 75, 220, 214, 140, 246, 213, 136, 148, 97, 35, 241, 60, 188, 180, 206, 80, 91, 96, 157, 81, 171, 141, 131, 158, 1, 208, 26, 41
};
void InsertStaticStrTable(char* csString, unsigned char cLen, void* pData, StaticStrTable* pTable) {
unsigned char cHash = cLen - csString[0];
for (int i = 0; i < cLen; ++i) cHash ^= cPerm256[cHash ^ csString[i]];
unsigned short cTableIndex = cHash % pTable->nBuckets;
long long* pBucket = pTable->pBuckets[cTableIndex];
// Inserts data and records how many collisions there are - it may look weird as the way in which I decided to pack the data into the table buffer is very compact and arbitrary
// It won't affect the hash though, which is the key issue!
for (int i = 0; i < pTable->nMaxCollisions; ++i) {
if (i == 1) {
pTable->nRepeats++;
}
long long* pSlotID = pBucket + (i << 1);
if (pSlotID[0] == 0) {
pSlotID[0] = csString;
pSlotID[1] = pData;
break;
}
}
}
FYI (This is not an answer, I just need the formatting)
These are just single runs from a simulation, YMMV.
distributing 50 elements randomly over 50 bins:
kalender_size=50 nperson = 50
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 18 (0.360000) 0 (0.000000) 0 0 0
1: 18 (0.360000) 18 (0.360000) 1 18 18
2: 10 (0.200000) 20 (0.400000) 3 30 48
3: 4 (0.080000) 12 (0.240000) 6 24 72
----+---------+--------+----------+--------+------+--------+--------
4: 50 50 1.440000 72
Similarly: distribute 365 persons over a birthday-calendar (ignoring leap days ...):
kalender_size=356 nperson = 356
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 129 (0.362360) 0 (0.000000) 0 0 0
1: 132 (0.370787) 132 (0.370787) 1 132 132
2: 69 (0.193820) 138 (0.387640) 3 207 339
3: 19 (0.053371) 57 (0.160112) 6 114 453
4: 6 (0.016854) 24 (0.067416) 10 60 513
5: 1 (0.002809) 5 (0.014045) 15 15 528
----+---------+--------+----------+--------+------+--------+--------
6: 356 356 1.483146 528
For N items over N slots, the expectation for the number of empty slots and the number of slots with a single item in them is equal. The expected density is 1/e for both.
The final number (1.483146) is the number of ->next pointer traversels per found element (when using a chained hash table) Any optimal hash function will almost reach 1.5.

How to Implement T-SQL CHECKSUM() in JavaScript for BigQuery?

The end result I'm looking for is to implement T-SQL CHECKSUM in BigQuery with a JavaScript UDF. I would settle for having the C/C++ source code to translate but if someone has already done this work then I'd love to use it.
Alternatively, if someone can think of a way to create an equivalent hash code between strings stored in Microsoft SQL Server compared to those in BigQuery then that would help me too.
UPDATE: I've found some source code through HABO's link in the comments which is written in T-SQL to perform the same CHECKSUM but I'm having difficulty converting it to JavaScript which inherently cannot handle 64bit integers. I'm playing with some small examples and have found that the algorithm works on the low nibble of each byte only.
UPDATE 2: I got really curious about replicating this algorithm and I can see some definite patterns but my brain isn't up to the task of distilling that into a reverse engineered solution. I did find that BINARY_CHECKSUM() and CHECKSUM() return different things so the work done on the former didn't help me with the latter.
I spent the day reverse engineering this by first dumping all results for single ASCII characters as well as pairs. This showed that each character has its own distinct "XOR code" and letters have the same one regardless of case. The algorithm was remarkably simple to figure out after that: rotate 4 bits left and xor by the code stored in a lookup table.
var xorcodes = [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
];
function rol(x, n) {
// simulate a rotate shift left (>>> preserves the sign bit)
return (x<<n) | (x>>>(32-n));
}
function checksum(s) {
var checksum = 0;
for (var i = 0; i < s.length; i++) {
checksum = rol(checksum, 4);
var c = s.charCodeAt(i);
var xorcode = 0;
if (c < xorcodes.length) {
xorcode = xorcodes[c];
}
checksum ^= xorcode;
}
return checksum;
};
See https://github.com/neilodonuts/tsql-checksum-javascript for more info.
DISCLAIMER: I've only worked on compatibility with VARCHAR strings in SQL Server with collation set to SQL_Latin1_General_CP1_CI_AS. This won't work with multiple columns or integers but I'm sure the underlying algorithm uses the same codes so it wouldn't be hard to figure out. It also seems to differ from db<>fiddle possibly due to collation: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ... mileage may vary!
fyi, for those of you who are stuck in T-SQL legacy mode, here's a C# implementation that was tested and looks good for most strings/ints that I've been working with:
public static int[] xorcodes = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?#
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
};
public static int rol(int x, int n) {
// simulate a rotate shift left (>>> preserves the sign bit)
return ((int)x << n) | ((int)((uint)x >> (32 - n)));
}
public static int checksum(string s) {
int checksum = 0;
for (var i = 0; i < s.Length; i++) {
checksum = rol(checksum, 4);
var c = ((int)s[i]);
int xorcode = 0;
if (c < xorcodes.Length) {
xorcode = xorcodes[c];
}
checksum ^= xorcode;
}
return checksum;
}

erlang:odbc - cannot get correct query results in unicode from mssql

What i'am trying:
- just make a select with erlang-odbc from elixir and dump all result to console.
Enviroment:
my side
Red Hat Enterprise Linux Server release 7.6 (Maipo)
unixODBC-devel ( yum)
Elixir 1.8.2 (compiled with Erlang/OTP 20)
Erlang/OTP 21
erlang-odbc R16B (yum)
(either) mssql driver
(or) FreeTds driver 1.1.6 (compiled from sources --with-unixodbc)
target
mssql server 2016
table
CREATE TABLE db_name.dbo.rating (
[Year] int NULL,
Code varchar(256) COLLATE Cyrillic_General_CI_AS NULL,
Name nvarchar(4000) COLLATE Cyrillic_General_CI_AS NULL,
GroupeCode varchar(256) COLLATE Cyrillic_General_CI_AS NULL,
GroupeName nvarchar(4000) COLLATE Cyrillic_General_CI_AS NULL,
Cost numeric(38,5) NOT NULL,
PrchasesCount int NULL
) GO
nvarchar:
character_set_name: UNICODE
collation_name: Cyrillic_General_CI_AS
varchar:
character_set_name: cp1251
collation_name: Cyrillic_General_CI_AS
elixir code looks like:
conn_str =
"SERVER=XX.XX.XX.XX,1433;" <>
#tried this too! "DRIVER={ODBC Driver 17 for SQL Server};" <>
"DRIVER=FreeTDS;" <>
"DATABASE=db_name;UID=bot;PWD=XXXXXX;"
|> to_charlist
statement = "select top(3) Name from Rating order by Cost desc" |> to_charlist
{:ok, pid}=:odbc.connect(conn_str,[])
{:selected, col_names, rows} = :odbc.sql_query(pid, statement)
and, after all attempts, i have something like this in result
{:selected, ['Name'],
[
{<<32, 4, 48, 4, 49, 4, 62, 4, 66, 4, 75, 4, 32, 0, 65, 4, 66, 4, 64, 4, 62,
4, 56, 4, 66, 4, 53, 4, 59, 4, 76, 4, 61, 4, 75, 4, 53, 4, 32, 0, 63, 4,
62, 4, 32, ...>>},
{<<16, 4, 64, 4, 53, 4, 61, 4, 52, 4, 48, 4, 32, 0, 63, 4, 48, 4, 65, 4, 65,
4, 48, 4, 54, 4, 56, 4, 64, 4, 65, 4, 58, 4, 62, 4, 51, 4, 62, 4, 32, 0,
66, 4, ...>>},
{<<35, 4, 65, 4, 59, 4, 67, 4, 51, 4, 56, 4, 32, 0, 63, 4, 62, 4, 32, 0, 64,
4, 53, 4, 58, 4, 67, 4, 59, 4, 76, 4, 66, 4, 56, 4, 50, 4, 48, 4, 70, 4,
56, ...>>}
]}
instead a correct Cyrillic text
its a not random numbers! result always the same.
MS driver gives same result as FreeTDS
what else i've tried
changing conn options binary_strings: :on/off
setting option client charset = UTF-8 in freetds.conf (in global section)
scratching my head
using :unicode functions to read << data >>
using is_binary() on received numbers returns true
Questions
what type of data do i receive?
why data is not correct decoded?
what app is responsible for it?
how can i fix it?
some part of freetds log here (about iconv)
iconv.c:326:tds_iconv_open(0x1e02330, UTF-8)
iconv.c:186:local name for ISO-8859-1 is ISO-8859-1
iconv.c:186:local name for UTF-8 is UTF-8
iconv.c:186:local name for UCS-2LE is UCS-2LE
iconv.c:186:local name for UCS-2BE is UCS-2BE
iconv.c:348:setting up conversions for client charset "UTF-8"
iconv.c:350:preparing iconv for "UTF-8" <-> "UCS-2LE" conversion
iconv.c:389:tds_iconv_open: done
iconv.c:785:setting server single-byte charset to "CP1251"
If you're using SQL Server 2019, you're going to want to use a collation that ends in _UTF8 rather than _AS. There's a full article here with details on collation types:
https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017#utf-8-support
Previous to MSSQL 2019,
I know this works for Python 3.

Ruby array conversion

I have a string of digits:
s = "12345678910"
As you can see it is the numbers 1 through 10 listed in increasing order. I want to convert it to an array of those numbers:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
How can I do it?
How about this:
a = ["123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899"]
b = a.first.each_char.map {|n| n.to_i }
if b.size > 8
c = b[0..8]
c += b[9..b.size].each_slice(2).map(&:join).map(&:to_i)
end
# It would yield as follows:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
For later numbers beyond 99, modify existing predicate accordingly.
Assuming a monotonic sequence, here's my run at it.
input = a.first.chars
output = []
previous_int = 0
until input.empty?
temp = []
temp << input.shift until temp.join.to_i > previous_int
previous_int = temp.join.to_i
output << previous_int
end
puts output.to_s
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Assumptions
the first (natural) number extracted from the string is the first character of the string converted to an integer;
if the number n is extracted from the string, the next number extracted, m, satisfies n <= m (i.e., the sequence is monotonically non-decreasing);
if n is extracted from the string, the next number extracted will have as few digits as possible (i.e., at most one greater than the number of digits in n); and
there is no need to check the validity of the string (e.g., "54632" is invalid).
Code
def split_it(str)
return [] if str.empty?
a = [str[0]]
offset = 1
while offset < str.size
sz = a.last.size
sz +=1 if str[offset,sz] < a.last
a << str[offset, sz]
offset += sz
end
a.map(&:to_i)
end
Examples
split_it("12345678910")
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
split_it("12343636412252891407189118901")
#=> [1, 2, 3, 4, 36, 36, 41, 225, 289, 1407, 1891, 18901]

(C++11) What's the difference between static array and dynamic array with list initialized?

For instance, there is an int array with thousands of elements:
static int st_indices[9999] = {
0, 27, 26, 1, 41, 71, 0, 26, 101, 0, 101, 131, 0, 131, 72,
1, 71, 176, 2, 56, 206, 3, 116, 236, 4, 146, 266, 5, 161, 296,
......
};
and
int* dy_indices = new int[9999] {
0, 27, 26, 1, 41, 71, 0, 26, 101, 0, 101, 131, 0, 131, 72,
1, 71, 176, 2, 56, 206, 3, 116, 236, 4, 146, 266, 5, 161, 296,
......
};
What's the difference between above two ways, especially the values in curly braces on memory usage?
I know that st_indices will life in memory until the program ends(STACK), and dy_indices will be release after delete [](HEAP). or it's a question about stack vs. .DATA segment?
Static are done at compile time..(set amount of memory, aka STACK)
Dynamic are done at run time (dynamic allocation, can be any size depending on system limits, aka HEAP)
From #Dr.Kameleon's answer , I learned that OS will reading the contents of executable file, and loading it into memory.
That is the data in curly braces will be loaded into the .TEXT segment of memory. If we don't take virtual memory/paging into account, putting the data in a file then read in, will reduce the memory usage(for an OpenGL app).

Resources