I have a database table with rows that each contain a sequential index. I want to select groups of rows that are consecutive based upon this index column. For example, if I had rows with the following index values:
1
3
4
5
7
9
10
11
12
15
16
and I wanted to select all groups with 3 consecutive indices (this number will vary). I would get the following groups:
3, 4, 5
9, 10, 11
10, 11, 12
Basically, I'm trying to achieve something similar to the question posed here:
selecting consecutive numbers using SQL query
However, I want to implement this with LINQ to Entities, not actual SQL. I would also prefer not to use stored procedures, and I don't want to do any sort of ToList/looping approach.
Edit: Groups with more than the requested consecutive elements don't necessarily need to be split apart. i.e. in the previous example, a result of 9, 10, 11, 12 would also be acceptable.
So I think I've come up with a pretty good solution modeled after Brian's answer in the topic I linked to.
var q = from a in query
from b in query
where a.Index < b.Index
&& b.Index < a.Index + 3
group b by new { a.Index }
into myGroup
where myGroup.Count() + 1 == 3
select myGroup.Key.Index;
Change 3 to the number of consecutive rows you want. This gives you the first index of every group of consecutive rows. Applied to the original example I provided, you would get:
3
9
10
I think this might work pretty efficiently (C# though):
int[] query = { 1, 3, 4, 5, 7, 9, 10, 11, 12, 15, 16 };
int count = 3;
List<List<int>> numbers = query
.Where(p => query.Where(q => q >= p && q < p + count).Count() == count)
.Select(p => Enumerable.Range(p, count).ToList())
.ToList();
using (var model = new AlbinTestEntities())
{
var triples = from t1 in model.Numbers
from t2 in model.Numbers
from t3 in model.Numbers
where t1.Number + 1 == t2.Number
where t2.Number + 1 == t3.Number
select new
{
t1 = t1.Number,
t2 = t2.Number,
t3 = t3.Number,
};
foreach (var res in triples)
{
Console.WriteLine(res.t1 + ", " + res.t2 + ", " + res.t3);
}
}
It generates the following SQL
SELECT
[Extent1].[Number] AS [Number],
[Extent2].[Number] AS [Number1],
[Extent3].[Number] AS [Number2]
FROM [dbo].[Numbers] AS [Extent1]
CROSS JOIN [dbo].[Numbers] AS [Extent2]
CROSS JOIN [dbo].[Numbers] AS [Extent3]
WHERE (([Extent1].[Number] + 1) = [Extent2].[Number]) AND (([Extent2].[Number] + 1) = [Extent3].[Number])
It might be even better to use an inner join like this
using (var model = new AlbinTestEntities())
{
var triples = from t1 in model.Numbers
join t2 in model.Numbers on t1.Number + 1 equals t2.Number
join t3 in model.Numbers on t2.Number + 1 equals t3.Number
select new
{
t1 = t1.Number,
t2 = t2.Number,
t3 = t3.Number,
};
foreach (var res in triples)
{
Console.WriteLine(res.t1 + ", " + res.t2 + ", " + res.t3);
}
}
but when I compare the resulting queries in management studio they generate the same execution plan and take exactly the same time to execute. I have only this limited dataset you might compare the performance on your dataset if it is larger and pick the best if they differ.
The following code will find every "root".
var query = this.commercialRepository.GetQuery();
var count = 2;
for (int i = 0; i < count; i++)
{
query = query.Join(query, outer => outer.Index + 1, inner => inner.Index, (outer, inner) => outer);
}
var dummy = query.ToList();
It will only find the first item in each group so you will either have to modify the query to remeber the other ones or you could make a query based on the fact that you have the roots and from those you know which indexes to get. I'm sorry I couldn't wrap it up before I had to go but maybe it helps a bit.
PS. if count is 2 as in this case it means if finds groups of 3.
Related
I have up to 16 datasets (only using 8 for the example here), and I'm trying to sort them into groups of 4, where the datasets within each group are as closely matched as possible. (Using a VBA macro in Excel).
My aim was to iterate through every possible combination of groups of 4 and compare how well matched they are to the previous "best match", overwriting that if so.
I've got no problems comparing how well matched the groups are, but the code I have won't iterate through every possible combination.
My question is why doesn't this code work? And if there is a better solution please let me know.
For a = 1 To UBound(Whitelist) - 3
For b = a + 1 To UBound(Whitelist) - 2
For c = b + 1 To UBound(Whitelist) - 1
For d = c + 1 To UBound(Whitelist)
TempGroups(1, 1) = a: TempGroups(1, 2) = b: TempGroups(1, 3) = c: TempGroups(1, 4) = d
For e = 1 To UBound(Whitelist) - 3
If InArray(TempGroups, e) = False Then
For f = e + 1 To UBound(Whitelist) - 2
If InArray(TempGroups, f) = False Then
For g = f + 1 To UBound(Whitelist) - 1
If InArray(TempGroups, g) = False Then
For h = g + 1 To UBound(Whitelist)
If InArray(TempGroups, h) = False Then
TempGroups(2, 1) = e: TempGroups(2, 2) = f: TempGroups(2, 3) = g: TempGroups(2, 4) = h
If HowClose(Differences, TempGroups, 1) + HowClose(Differences, TempGroups, 2) < HowClose(Differences, Groups, 1) + HowClose(Differences, Groups, 2) Then
For x = 1 To 4
For y = 1 To 4
Groups(x, y) = TempGroups(x, y)
Next y
Next x
End If
End If
Next h
End If
Next g
End If
Next f
End If
Next e
Next d
Next c
Next b
Next a
For Reference, UBound(Whitelist) can be taken as 8 (number of datasets I have to match)
TempGroups is an array which I'm writing each iteration to, so it can be compared to...
Groups, the array which will contain the data sorted into matched groups
The InArray function checks to see if the value is already allocated to a group, as the datasets can only be in one group each.
Thanks in advance!
Images:
Datasets
Relatively Well Matched Data
Fairly Poorly Matched Data
Considering having this table in matlab:
t = table([1; 0; 3; 1], [0; 1; 0; 4], 'VariableNames', {'A', 'B'});
A B
_ _
1 0
0 1
3 0
1 4
I want to append a new column Cwith specific value that is based on a condition. Currently I use this loop:
for i=1:height(t)
if t(i, 'A').Variables == 1
t.C(i, 1) = 4;
elseif t(i, 'A').Variables == 3
t.C(i, 1) = 5;
end
end
However, this is a time comsuming operation is case of a table size greater then 100k rows.
What would be the best solution for this?
You can use the following:
[t{t.A==1, 'C'}] = 4;
[t{t.A==3, 'C'}] = 5;
This uses the facts that
Table contents can be indexed via {}, as in cell arrays;
Table columns can be indexed by their name.
Or, as noted by #SardarUsama, you can use the simpler
t.C(t.A==1)=4;
t.C(t.A==3)=5;
This uses dot notation to index the column. The result is a numerical column vector, to which a scalar can be directly assigned.
Suppose I have two arrays ordered in an ascending order, i.e.:
A = [1 5 7], B = [1 2 3 6 9 10]
I would like to create from B a new vector B', which contains only the closest values to A values (one for each).
I also need the indexes. So, in my example I would like to get:
B' = [1 6 9], Idx = [1 4 5]
Note that the third value is 9. Indeed 6 is closer to 7 but it is already 'taken' since it is close to 4.
Any idea for a suitable code?
Note: my true arrays are much larger and contain real (not int) values
Also, it is given that B is longer then A
Thanks!
Assuming you want to minimize the overall discrepancies between elements of A and matched elements in B, the problem can be written as an assignment problem of assigning to every row (element of A) a column (element of B) given a cost matrix C. The Hungarian (or Munkres') algorithm solves the assignment problem.
I assume that you want to minimize cumulative squared distance between A and matched elements in B, and use the function [assignment,cost] = munkres(costMat) by Yi Cao from https://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-:
A = [1 5 7];
B = [1 2 3 6 9 10];
[Bprime,matches] = matching(A,B)
function [Bprime,matches] = matching(A,B)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[matches,~] = munkres(C);
Bprime = B(matches);
end
Assuming instead you want to find matches recursively, as suggested by your question, you could either walk through A, for each element in A find the closest remaining element in B and discard it (sortedmatching below); or you could iteratively form and discard the distance-minimizing match between remaining elements in A and B until all elements in A are matched (greedymatching):
A = [1 5 7];
B = [1 2 3 6 9 10];
[~,~,Bprime,matches] = sortedmatching(A,B,[],[])
[~,~,Bprime,matches] = greedymatching(A,B,[],[])
function [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches)
[~,ix] = min((A(1) - B).^2);
matches = [matches ix];
Bprime = [Bprime B(ix)];
A = A(2:end);
B(ix) = Inf;
if(not(isempty(A)))
[A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches);
end
end
function [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[minrows,ixrows] = min(C);
[~,ixcol] = min(minrows);
ixrow = ixrows(ixcol);
matches(ixrow) = ixcol;
Bprime(ixrow) = B(ixcol);
A(ixrow) = -Inf;
B(ixcol) = Inf;
if(max(A) > -Inf)
[A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches);
end
end
While producing the same results in your example, all three methods potentially give different answers on the same data.
Normally I would run screaming from for and while loops in Matlab, but in this case I cannot see how the solution could be vectorized. At least it is O(N) (or near enough, depending on how many equally-close matches to each A(i) there are in B). It would be pretty simple to code the following in C and compile it into a mex file, to make it run at optimal speed, but here's a pure-Matlab solution:
function [out, ind] = greedy_nearest(A, B)
if nargin < 1, A = [1 5 7]; end
if nargin < 2, B = [1 2 3 6 9 10]; end
ind = A * 0;
walk = 1;
for i = 1:numel(A)
match = 0;
lastDelta = inf;
while walk < numel(B)
delta = abs(B(walk) - A(i));
if delta < lastDelta, match = walk; end
if delta > lastDelta, break, end
lastDelta = delta;
walk = walk + 1;
end
ind(i) = match;
walk = match + 1;
end
out = B(ind);
You could first get the absolute distance from each value in A to each value in B, sort them and then get the first unique value to a sequence when looking down in each column.
% Get distance from each value in A to each value in B
[~, minIdx] = sort(abs(bsxfun(#minus, A,B.')));
% Get first unique sequence looking down each column
idx = zeros(size(A));
for iCol = 1:numel(A)
for iRow = 1:iCol
if ~ismember(idx, minIdx(iRow,iCol))
idx(iCol) = minIdx(iRow,iCol);
break
end
end
end
The result when applying idx to B
>> idx
1 4 5
>> B(idx)
1 6 9
I have the following four nested loops in Matlab:
timesteps = 5;
inputsize = 10;
additionalinputsize = 3;
outputsize = 7;
input = randn(timesteps, inputsize);
additionalinput = randn(timesteps, additionalinputsize);
factor = randn(inputsize, additionalinputsize, outputsize);
output = zeros(timesteps,outputsize);
for t=1:timesteps
for i=1:inputsize
for o=1:outputsize
for a=1:additionalinputsize
output(t,o) = output(t,o) + factor(i,a,o) * input(t,i) * additionalinput(t,a);
end
end
end
end
There are three vectors: One input vector, one additional input vector and an output vector. All the are connected by factors. Every vector has values at given timesteps. I need the sum of all combined inputs, additional inputs and factors at every given timestep. Later, I need to calculate from the output to the input:
result2 = zeros(timesteps,inputsize);
for t=1:timesteps
for i=1:inputsize
for o=1:outputsize
for a=1:additionalinputsize
result2(t,i) = result2(t,i) + factor(i,a,o) * output(t,o) * additionalinput(t,a);
end
end
end
end
In a third case, I need the product of all three vectors summed over every timestep:
product = zeros(inputsize,additionalinputsize,outputsize)
for t=1:timesteps
for i=1:inputsize
for o=1:outputsize
for a=1:additionalinputsize
product(i,a,o) = product(i,a,o) + input(t,i) * output(t,o) * additionalinput(t,a);
end
end
end
end
The two code snippets work but are incredibly slow. How can I remove the nested loops?
Edit: Added values and changed minor things so the snippets are executable
Edit2: Added other use case
First Part
One approach -
t1 = bsxfun(#times,additionalinput,permute(input,[1 3 2]));
t2 = bsxfun(#times,t1,permute(factor,[4 2 1 3]));
t3 = permute(t2,[2 3 1 4]);
output = squeeze(sum(sum(t3)));
Or a slight variant to avoid squeeze -
t1 = bsxfun(#times,additionalinput,permute(input,[1 3 2]));
t2 = bsxfun(#times,t1,permute(factor,[4 2 1 3]));
t3 = permute(t2,[1 4 2 3]);
output = sum(sum(t3,3),4);
Second Part
t11 = bsxfun(#times,additionalinput,permute(output,[1 3 2]));
t22 = bsxfun(#times,permute(t11,[1 4 2 3]),permute(factor,[4 1 2 3]));
result2=sum(sum(t22,3),4);
Third Part
t11 = bsxfun(#times,permute(output,[4 3 2 1]),permute(additionalinput,[4 2 3 1]));
t22 = bsxfun(#times,permute(input,[2 4 3 1]),t11);
product = sum(t22,4);
I have an enum
public enum Positions : byte
{
Manager = 0,
CEO = 1,
Lawyer =2,
Intern =3,
Janitor = 4,
}
Is it possible to get a subset of these emums to bind with a ComboBox in WPF? Say only those enum values <=2 and >= 0? I was trying:
var subset = from p in Positions where p <= 2 && p >= 0 select p;
myComboBox.ItemsSource = subset;
without success (Positions is flagged as error with "Could not find an implementation of the query pattern...")
I was thinking that this would be nice to use LINQ on, but if there's another simple way, that would be interesting too.
Thanks,
Dave
You'll need to get the enum values and cast it to the proper type:
var subset = from p in Enum.GetValues(typeof(Positions)).Cast<int>()
where p <= 2 && p >= 0 select (Positions)p;
The last cast is unnecessary.
var subset = from p in Enum.GetValues(typeof(Positions)).Cast<Positions>() where p <= Postions.Lawyer && p >= Positions.Manager select p;