Query data from multi keys sorted sets in redis - database

I have several sorted sets stored in redis. Like:
ZADD tag:1 1 1 2 2 3 3 4 4 5 5 6 6
ZADD tag:2 21 1 22 2 23 3 24 4 25 5 26 6
ZADD tag:3 31 1 32 2 33 3 34 4 35 5 36 6
Here is my question: I want to get the data sorted by scores in tag:1 and tag:2, or tag:1 and tag:3, or tag:1,tag:2,and tag:3. That means I need to get data from different key combination([ 1 ] [ 2 ] [ 3 ] [ 1,2,3 ] [ 1,2 ] [ 2, 3 ] [ ... ] ). I have hundreds of this kind of sorted sets, with each sorted set that can be combined to any one/two/more of the others.
I kinda not choosing the ZUNIONSTORE, cause all the combination is temporary, and ZUNIONSTORE will create another new sorted set, and this set will have very low possibility for reusing. So is there any good idea to solve my problem, or any new solution to help me? Thanks in advance!

Despite your reluctance, use ZUNIONSTORE for this. Once you're done, just DEL the result. This workflow can be embedded in a Lua script that performs the actions and returns the unified result.

Related

How to convert data frame columns values into an array without loop

I have a data frame like this:
df = pd.DataFrame({'A': [10,10,11,14], 'B':[2,3,3,5]})
It looks like this:
A B
0 10 2
1 10 3
2 11 3
3 14 5
I want to convert to this, with A as the row index, and store B's values inside the array or matrix:
10 2 3
11 3
14 5
Is there python way of doing this without looking in each row in data frame df?
many thanks
Use groupby:
df.groupby('A')
Then you can (for instance) get the mean of the grouped version by:
df.groupby('A').mean()
which result in:
B
A
10 2.5
11 3.0
14 5.0

Julia: Sort the columns of a matrix by the values in another vector (in place...)?

I am interested in sorting the columns of a matrix in terms of the values in 2 other vectors. As an example, suppose the matrix and vectors look like this:
M = [ 1 2 3 4 5 6 ;
7 8 9 10 11 12 ;
13 14 15 16 17 18 ]
v1 = [ 2 , 6 , 6 , 1 , 3 , 2 ]
v2 = [ 3 , 1 , 2 , 7 , 9 , 1 ]
I want to sort the columns of A in terms of their corresponding values in v1 and v2, with v1 taking precedence over v2. Additionally, I am interested in trying to sort the matrix in place as the matrices I am working with are very large. Currently, my crude solution looks like this:
MM = [ v1' ; v2' ; M ] ; ## concatenate the vectors with the matrix
MM[:,:] = sortcols(MM , by=x->(x[1],x[2]))
M[:,:] = MM[3:end,:]
which gives the desired result:
3x6 Array{Int64,2}:
4 6 1 5 2 3
10 12 7 11 8 9
16 18 13 17 14 15
Clearly my approach is not ideal is it requires computing and storing intermediate matrices. Is there a more efficient/elegant approach for sorting the columns of a matrix in terms of 2 other vectors? And can it be done in place to save memory?
Previously I have used sortperm for sorting an array in terms of the values stored in another vector. Is it possible to use sortperm with 2 vectors (and in-place)?
I would probably do it this way:
julia> cols = sort!([1:size(M,2);], by=i->(v1[i],v2[i]));
julia> M[:,cols]
3×6 Array{Int64,2}:
4 6 1 5 2 3
10 12 7 11 8 9
16 18 13 17 14 15
This should be pretty fast and uses only one temporary vector and one copy of the matrix. It's not fully in-place, but doing this operation completely in-place is not easy. You would need a sorting function that moves columns as it works, or alternatively a version of permute! that works on columns. You could start with the code for permute!! in combinatorics.jl and modify it to permute columns, reusing a single column-size temporary buffer.

Extracting array values based on values in different dimension

I've got a problem with subsetting values of an array.
raw.table <- array(data = c(1:12,13:24,rep(1:6, each=2)),
dim=c(3,4,3),
dimnames=list(LETTERS[1:3],1:4,c("target","ctrl","samples")))
The first two dimensions of my array represent some values that I want to do statistics on and the higher dimensions contain different attributes I want to use to access specific subsets. In this case I have only sample numbers, whereas there are always two values assigned to the same sample number (measurement replicates).
, , target
1 2 3 4
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
, , ctrl
1 2 3 4
A 13 16 19 22
B 14 17 20 23
C 15 18 21 24
, , samples
1 2 3 4
A 1 2 4 5
B 1 3 4 6
C 2 3 5 6
How do I access the values in dimension 1 (= target) that have the same sample number denoted in dimension 3 (= samples)? I tried out different approaches using unique(), duplicated() and match() but without coming to a result. I just cannot wrap my head about the indexing of arrays -.-
Cheers,
zuup
Form a logical index with a logical test (across dimensions):
> raw.table[,,1] == raw.table[,,3]
1 2 3 4
A TRUE FALSE FALSE FALSE
B FALSE FALSE FALSE FALSE
C FALSE FALSE FALSE FALSE
And use it to select items from the first dimension (and since they will be equal length there is no recycling):
> raw.table[, , 1 ][ raw.table[,,1] == raw.table[,,3] ]
[1] 1
Chaining calls to the Extract-operator is perfectly acceptable in R

How to create Orthogonal array?

Suppose we have following three factors:
Factor A: 5 possible values
Factor B: 4 possible values
Factor C: 2 possible values
How can I construct an Orthogonal array for these?
Main thing which I don't understand is making the combinations. I remember we used to follow '11112222', '11221122', '12121212' this kinda combinations, but it seems everyone has different approach for filling the values in array.
Is there any standard approach?
There isn't a single neat algorithm that generates orthogonal arrays to order. Instead there are a variety of constructions that have been discovered in a host of different areas of mathematics, and some techniques for modifying orthogonal arrays to change their parameters in some way or another. For instance see http://www.itl.nist.gov/div898/handbook/pri/section3/pri33a.htm and http://www.win.tue.nl/~aeb/preprints/oa3.pdf. Many statistics packages have an orthogonal array design utility which uses these rules and a list of known orthogonal arrays to try and find an orthogonal array that will satisfy the requirements it has been given.
In your case I can find nothing closer at the moment than the six five-level factors design at http://www.york.ac.uk/depts/maths/tables/l25.htm using 25 runs. You can certainly discard three columns. Where you have e.g. five levels in the design and only 4 (or 2) levels in the experiment I would be inclined to consistently relabel e.g. {1,2,3,4,5} -> {1,2,3,4,4} and {1,2,3,4,5} => {1,2,1,2,1} but I have no clear idea of what this does to the experimental properties.
The computing of orthogonal arrays can be computationally expensive, so designs are generally made available in the form of a library.
The R package DOE.base has a oa.design() function that retrieves a design with a given number of factors and factor levels. For example, to retrieve a design with 3 factors and levels of 3, 4 and 5, use these commands.
library(DOE.base)
oa.design(nlevels=c(3,4,5))
In this case, the returned design is a full factorial with 60 runs. This still is an orthogonal array, but a much more expensive experiment than the alternatives with equal factor levels.
To obtain an orthogonal array 3 factors with 5 levels each, use:
oa.design(nlevels=c(5,5,5))
A B C
1 1 5 4
2 2 1 5
3 3 4 5
4 3 5 2
5 5 2 4
6 3 3 3
7 5 5 5
8 5 4 3
9 2 5 3
10 5 1 2
11 4 1 3
12 5 3 1
13 4 4 4
14 1 1 1
15 1 2 3
16 3 2 1
17 2 3 4
18 4 3 2
19 4 5 1
20 3 1 4
21 1 3 5
22 1 4 2
23 4 2 5
24 2 2 2
25 2 4 1
The entering 3 factors with 4 levels each returns an orthogonal array of 16 runs and entering 3 factors of 3 levels returns an orthogonal array of 9 runs.
Alternatively, the Python package OApackage is available in PyPi (https://pypi.org/project/OApackage/).
For more information, see:
Complete Enumeration of Pure-Level and Mixed-Level Orthogonal Arrays, E.D. Schoen, P.T. Eendebak, M.V.M. Nguyen, Journal of Combinatorial Designs, Volume 18, Issue 2, pages 123-140, 2010.
Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions, Pieter T. Eendebak, Eric D. Schoen, Technometrics Vol. 59 , Iss. 1, 2017

Partitioning by counts in SQL

This issue is related to question I asked here. I have a table that looks like this:
Item Count
1 1
2 4
3 8
4 2
5 6
6 3
I need to group items that are, for example, less than 5 into a new group and the total of each groups should be at least 5. The result should look like this:
Item Group Count
1 1 1
2 1 4
3 2 8
4 3 2
5 4 6
6 3 3
How do I achieve this? Many thanks.
Why isn't this a correct result?
Item Group Count
1 1 1
2 2 4
3 3 8
4 4 2
5 5 6
6 1 3
Or this?
Item Group Count
1 1 1
2 2 4
3 3 8
4 4 2
5 5 6
6 6 3
Seems to me that you're trying to solve the answer 'how to group the items as to minimize the number of groups and maximize the number of items in each group, w/o exceeding the limit 5'. Which sounds a lot like the Knapsack problem. Perhaps a you should read the Celko's SQL Stumper: The Class Scheduling Problem and the solutions proposed. Others have also approached this problem, eg. And now for a completely inappropriate use of SQL Server. Heads up: this is no a trivial problem by any means. Any naive algorithm will die a slow death attempting to solve it on a 1M rows table...

Resources