How can I create query that (counting values inside Column) - sql-server

I have 3 companies 1001,1002 ,1003 it could be more and 11 containers with different sizes 1,2,3,4, 5 I want return only the containers that are in the companies that have the same amount or more of specified numbers. for example if I want 2 containers from size 1 and 3 containers from size 2 then only the containers in the company that has 2 or more of size 1 and 3 or more of size 2 should appear let's say that only company 1001 has them then it should appear alone.
I tried different queries and post one here but they recommend me to post a new question with the problem that I'm training to make query for.
(Company info and containers info are in two separate tables)
this is what I get when I remove having (basically all the containers in the city that has been selected)
CoID CoName ContainerID Price size1 size2 size3 size4 size5
6000001 hbjjvCompany 2000002 50 1 0 0 0 0
6000001 hbjjvCompany 2000003 50 1 0 0 0 0
6000002 NCompany 2000004 50 1 0 0 0 0
6000001 hbjjvCompany 2000005 100 0 1 0 0 0
6000002 NCompany 2000007 100 0 1 0 0 0
6000001 hbjjvCompany 2000008 200 0 0 1 0 0
6000001 hbjjvCompany 2000009 200 0 0 1 0 0
6000001 hbjjvCompany 2000010 200 0 0 1 0 0
6000002 NCompany 2000011 200 0 0 1 0 0
6000001 hbjjvCompany 2000012 400 0 0 0 0 1
6000003 ghhaCo 2000014 200 0 1 0 0 0
what should I get is
CoID CoName size1 size2 size3 size4 size5
6000001 hbjjvCompany 2 1 3 0 1
of course I want the containers id and the price but I put it heare like this to make it clear that my query show all the containers even if i removed the ContainerID and price.

I think it's this you're looking for:
CREATE TABLE #YourTable(CoID INT,CoName VARCHAR(100),ContainerID INT,Price DECIMAL(10,4),size1 INT,size2 INT,size3 INT,size4 INT,size5 INT);
INSERT INTO #YourTable VALUES
(6000001,'hbjjvCompany',2000002,50,1,0,0,0,0)
,(6000001,'hbjjvCompany',2000003,50,1,0,0,0,0)
,(6000002,'NCompany',2000004,50,1,0,0,0,0)
,(6000001,'hbjjvCompany',2000005,100,0,1,0,0,0)
,(6000002,'NCompany',2000007,100,0,1,0,0,0)
,(6000001,'hbjjvCompany',2000008,200,0,0,1,0,0)
,(6000001,'hbjjvCompany',2000009,200,0,0,1,0,0)
,(6000001,'hbjjvCompany',2000010,200,0,0,1,0,0)
,(6000002,'NCompany',2000011,200,0,0,1,0,0)
,(6000001,'hbjjvCompany',2000012,400,0,0,0,0,1)
,(6000003,'ghhaCo',2000014,200,0,1,0,0,0);
SELECT CoID
,CoName
,SUM(Price) AS SumPrice
,SUM(size1) AS CountSize1
,SUM(size2) AS CountSize2
,SUM(size3) AS CountSize3
,SUM(size4) AS CountSize4
,SUM(size5) AS CountSize5
FROM #YourTable
GROUP BY CoID,CoName;
--Clean up
DROP TABLE #YourTable;
The result
CoID CoName SumPrice s1 s2 s3 s4 s5
6000003 ghhaCo 200.0000 0 1 0 0 0
6000001 hbjjvCompany 1200.0000 2 1 3 0 1
6000002 NCompany 350.0000 1 1 1 0 0

Related

Repeat rows based on item count for each row and assign values for repeated rows

I have a df with the item and it is available in different rooms
Item Room1 Room2 Room3 Room4
Ball 1 1 1 0
Bat 1 1 1 1
Wicket 1 1 1 0
Now I want to repeat the rows based on item counts on different Rooms. For example for Item - Ball there are three 1's in Room1, Room2, Room3 so need to repeat 3 rows with assigning 0 in each row only for Room1, Room2, Room3 columns, and Room4 is not considered for Item Ball and it can be 0's for all Ball item rows. There are 300 columns with different room names, for example Room1,room2,room3,room4,BlockArea1,Block2 etc.Below is the expected output
Item Room1 Room2 Room3 Room4
Ball 1 1 1 0
Ball 1 0 1 0
Ball 1 1 0 0
Bat 1 1 1 1
Bat 1 1 1 0
Bat 1 1 0 1
Bat 1 0 1 1
Wicket 1 1 1 0
Wicket 1 0 1 0
Wicket 1 1 0 0
Any help would be appreciated
To have a more interesting example, with a source row containing 0
somewhere else than in the last column, I created df as:
Item Room1 Room2 Room3 Room4
0 Ball 1 1 1 0
1 Bat 1 1 1 1
2 Wicket 1 1 1 0
3 Xxxx 0 1 1 1
The first step is to define a function to process each row:
def rowProc(row):
n = 0
res = []
for idx, val in row[row > 0].items():
outRow = row.copy()
if n > 0:
outRow[idx] = 0
res.append(outRow)
n += 1
return pd.DataFrame(res)
An important project detail is that the source row comes here from
a bit "changed" DataFrame, namely Item column will be set as
the index. So the only processed columns are "further" (Room...)
columns.
For the current row it generates a DataFrame containing:
as many rows as how many ones contains the source row,
the first output row is an exact copy of the source row (like in
your expected result),
further rows have consecutive ones set to 0.
Then run:
result = pd.concat(df.set_index('Item').apply(rowProc, axis=1).tolist())
result.index.name = 'Item'
result.reset_index(inplace=True)
The result is:
Item Room1 Room2 Room3 Room4
0 Ball 1 1 1 0
1 Ball 1 0 1 0
2 Ball 1 1 0 0
3 Bat 1 1 1 1
4 Bat 1 0 1 1
5 Bat 1 1 0 1
6 Bat 1 1 1 0
7 Wicket 1 1 1 0
8 Wicket 1 0 1 0
9 Wicket 1 1 0 0
10 Xxxx 0 1 1 1
11 Xxxx 0 1 0 1
12 Xxxx 0 1 1 0

How to isolate specific boolean island given a coordinate on it?

Given a 2d array with Boolean islands where 1 is land and 0 is water. Lets say I want only the island to which I point with a coordinate. How would I transfer it to a new array where everything beyond the borders of that island is water.
Here is a simple example.
I am given this 2d array
1 0 0 1 1 0
0 1 0 0 0 1
1 1 1 0 0 0
0 1 0 1 0 1
1 1 1 1 1 0
and the coordinate [1][2] (that would be the 2nd column 3rd row)
Then the final result in the new array should be something like
0 0 0 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 1 0 1 0 0
1 1 1 1 1 0
The pixels can only be connected either up, down, left or right to each other (no diagonals)

Optimizing 2D grid connectivity algorithm

Summary: I'm looking for an optimal algorithm to ensure connectivity over a 2D grid of binary values. I have a fairly involved algorithm that does it in effectively linear time, but only if certain pre-processing steps are performed. The following goes into fairly extensive detail about the algorithm and its run time. I've also put together a Unity app that offers a detailed visualization of all the steps mentioned below (and some others), which can be found here.
I have a set of scripts that procedurally generate terrain using an algorithm called marching squares. One of the steps is to connect all the regions together. Specifically, I have a grid of 0s (floors) and 1s (walls) and want to ensure that every 0 is reachable from every other 0. I'm optimizing for:
The amount of tunneling that needs to be done. i.e. the number of 1s that are turned into 0 should be minimized.
Asymptotic run-time. I'm trying to make it linear in the number of tiles in the grid, or as close to linear as possible.
By treating the rooms (connected regions of 0s) as vertices and potential tunnels as edges, we can use a minimum spanning tree algorithm as our workhorse. I'll describe the algorithm from the starting point of an unconnected grid of 0s and 1s.
Input:
A 2d array of bytes, either 0 or 1, representing terrain (0: floor, 1: wall).
e.g. the following has four 'rooms' (connected components of floors).
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1
1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Output
The same grid, such that each room can be reached from any other room, with
the least amount of damage done to the grid (fewest 1s flipped to 0s). Here we've carved a total of three tunnels:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Overview of basic algorithm:
The following is a high level description of the algorithm, without several crucial optimizations, in order to illustrate the high level ideas:
Run a BFS on the grid to find rooms (connected components of floors), storing only edge tiles (i.e. floor tiles adjacent to a wall tile) since the shortest path between two rooms will always be between two edge tiles.
For each pair of rooms, do a double loop to find the pair of tiles with the shortest euclidean distance. This pair forms a potential tunnel between the two rooms, by digging a straight path between them.
Treat the rooms from (1) as vertices, and the pairs from (2) as edges in a graph, with the weights being euclidean distance. Run Kruskal's Minimum Spanning Tree algorithm on this graph to acquire a list of tunnels to dig that minimize the number of tiles that need to be changed.
This guarantees connectivity, and it guarantees the absolute minimum number of changed tiles (caveat: this is false if we consider the possibility of connecting a room not to another room, but to a tunnel between another pair of rooms). The issue is that it scales poorly: step (2) scales quadratically in the number of tiles in the grid.
Optimized algorithm
The bottleneck is with step (2). We're meticulously checking every single pair of tiles for every pair of rooms to ensure we get the absolute smallest connection. If we accept a little bit of error (i.e. a suboptimal connection) we can speed it up dramatically. The basic idea is to skip a number of tiles proportional to the distance we just computed: if we compute a large distance between tile A and tile B, then chances are that we're nowhere near an optimal connection, so we can skip checking the nearby tiles. Any error from this skipping will be proportional to the length of the optimal connection.
To explain this visually, suppose X and Y represent a current pair of tiles being checked, and that we're currently looping X over the room on the left.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 X 0 0 0 0 0 0 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 0 0 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 0 0 0 1
1 1 0 0 0 0 1 1 0 0 0 0 0 1 1
1 1 1 1 1 1 1 0 0 0 0 Y 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
These have a distance of 11, so let's skip 11 tiles (marked with dashes):
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 - - - - - - 1 1 1 1 1 1 1
1 1 0 0 0 0 0 - - 1 1 1 1 1 1
1 1 0 0 0 0 - - 1 1 1 1 1 1 1
1 0 0 0 0 X - 1 1 1 1 0 0 0 1
1 1 0 0 0 0 1 1 0 0 0 0 0 1 1
1 1 1 1 1 1 1 0 0 0 0 Y 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Most comparisons will be far apart, so this dramatically reduces the run time of step 2, and in practice, produces minimal error.
The one issue being overlooked here is that this assumes the edge tiles are ordered: this requires an additional preprocessing step.
Thus here's the optimized algorithm:
Perform the BFS as in the previous algorithm.
Sort each room. This can be done by executing a depth-first along the edge tiles. This will give us a roughly continuous path along the edge of the room. There are a few (literal) corner cases where the path can jump, but the paths are continuous within reasonable approximation.
Perform the double loop in step 2 from the previous algorithm, but this time, instead of incrementing by one tile, increment by the last computed distance.
Perform Kruskal's algorithm as before. Note that Kruskal's requires we sort the edges in the graph: since the graph is complete (every pair of rooms has a potential tunnel), a standard sort becomes the new bottleneck in the algorithm. Since we're sorting by distance, which is a float, we can achieve a much faster sort by truncating the floats and turning them into integers. Again, this produces minimal error (Kruskal's might choose a tunnel with distance 4.6 over a tunnel with distance 4.2, for example) but offers dramatic speedup.
Run-time analysis
Let n denote the number of tiles in the grid. Let m denote the number of rooms (connected components) in the grid. Note that in general, in the worst case, m = O(n).
There are four steps in the algorithm. (1) and (2) are both O(n) in memory and time, as they are a BFS and DFS that processes each tile in the map at most once.
(3) is a bit trickier. We're doing a double loop over every room, and then finding a connection. This is O(m^2) for the double loop, multipled by the average work done per pair of rooms. Giving a tight analytical bound for the work done on average per pair of rooms is not a straightforward matter. But empirically, testing over a large variety of configurations, as n grows large, the average work converges to a single comparison. This is because the average distance of tiles between rooms grows as the grid grows.
So in total the work done for (3) is O(m^2), with O(1) storage.
For (4), it's given by the runtime for Kruskal's, which is O(m^2 logm^2) to sort the edges naively and O(m^2 a(m^2)) to run the edges through the UnionFind data structure, where a is the inverse ackermann function (effectively a constant). If we truncate the edge lengths and use an integer sorting algorithm, we can get the sort down to O(m^2). Storage is O(m^2).
So in total, the runtime is dominated by Kruskal's algorithm, given by O(m^2 a(m^2)), or effectively, O(m^2). Given that m = O(n) in the worst case, this is not very good performance. But we can do one final preprocessing step on the grid to get it down to O(n), which is to limit the number of rooms in an organic way.
Prior to any of the other steps, we can use a floodfill algorithm to fill in small rooms in linear time: specifically, we can fill in any room of size less than sqrt(n). Since there can only be at most sqrt(n) rooms of size at least sqrt(n) in the grid, it follows that m = O(sqrt(n)), making the entire algorithm linear in the size of the grid.
Conclusion
Is it possible to do better than this? Obviously we cannot do asymptotically better than linear time and storage, but in order to achieve those figures, a certain amount of sloppily quantified suboptimality in the tunnel lengths is accepted, and it requires modifying the original grid (namely, putting a bound on the number of rooms).

Matlab finding the center of cluster of a few pixels and counting the clusters

So I have this matrix A, which is made of 1 and zeros, I have about 10 to 14 white spots of many pixels, but I want only 1 white pixel/centers coordinate for every cluster of white, how do I calculate how many cluster there are and their centers.
Try to imagine the matrix A as the night sky with white starts in black sky and how to I count the stars and the stars centers, plus the star are made of cluster of white pixels.
also the clusters are not all exactly the same size.
Here is some code using bwlabel and/or regioprops, which are used to identify connected components in a matrix and a buch of other properties, respectively. I think it suits your problem quite well; however you might want to adapt my code a bit as its more of a starting point.
clear
clc
%// Create dummy matrix.
BW = logical ([ 1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 0 0 0]);
%// Identify clusters.
L = bwlabel(BW,4)
Matrix L looks like this:
L =
1 1 1 0 3 3 3 0
1 1 1 0 3 3 3 0
1 1 1 0 3 3 3 0
0 0 0 0 0 0 0 0
0 0 0 0 0 4 4 0
2 2 2 2 0 4 4 0
2 2 2 2 0 4 4 0
2 2 2 2 0 0 0 0
Here you have many ways to locate the center of the clusters. The first one uses the output of bwlabel to find each cluster and calculate the coordinates in a loop. It works and its didactic but it's a bit long and not so efficient. The 2nd method, as mentioned by #nkjt, uses regionprops which does exactly what you want using the 'Centroid' property. So here are the 2 methods:
Method 1: a bit complicated
So bwlabel identified 4 clusters, which makes sense. Now we need to identify the center of each of those clusters. My method could probably be simplified; but I'm a bit out of time so fell free to modify it as you see fit.
%// Get number of clusters
NumClusters = numel(unique(L)) -1;
Centers = zeros(NumClusters,2);
CenterLinIdices = zeros(NumClusters,1);
for k = 1:NumClusters
%// Find indices for elements forming each cluster.
[r, c] = find(L==k);
%// Sort the elements to know hot many rows and columns the cluster is spanning.
[~,y] = sort(r);
c = c(y);
r = r(y);
NumRow = numel(unique(r));
NumCol = numel(unique(c));
%// Calculate the approximate center of the cluster.
CenterCoord = [r(1)+floor(NumRow/2) c(1)+floor(NumCol/2)];
%// Actually this array is not used here but you might want to keep it for future reference.
Centers(k,:) = [CenterCoord(1) CenterCoord(2)];
%// Convert the subscripts indices to linear indices for easy reference.
CenterLinIdices(k) = sub2ind(size(BW),CenterCoord(1),CenterCoord(2));
end
%// Create output matrix full of 0s, except at the center of the clusters.
BW2 = false(size(BW));
BW2(CenterLinIdices) = 1
BW2 =
0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
Method 2 Using regionprops and the 'Centroid' property.
Once you have matrix L, apply regionprops and concatenate the output to get an array containing the coordinates directly. Much simpler!
%// Create dummy matrix.
BW = logical ([ 1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
1 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 1 1 0
1 1 1 1 0 0 0 0]);
%// Identify clusters.
L = bwlabel(BW,4)
s = regionprops(L,'Centroid');
CentroidCoord = vertcat(s.Centroid)
which gives this:
CentroidCoord =
2.0000 2.0000
2.5000 7.0000
6.0000 2.0000
6.5000 6.0000
Which is much simpler and gives the same output once you use floor.
Hope that helps!

rotate vector of arbitrary length circularly about an array about some point x,y in matlab

I have an array:
1 1 1 0 0
1 2 2 0 0
1 2 3 0 0
0 0 0 0 0
0 0 0 0 0
I want to make it
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
It is like rotating 1/4 piece of pie 270 degrees to fill out the remaining parts of the pie to make a full circle. Essentially mirroring the entire corner in all directions. I don't want to use any in built matlab features if possible - just some vector tricks if possible. Thanks.
EDIT:
This is embedded within an matrix of zeros of arbitrary size. I want it to work in both the above example and say this example:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 0 0 0 0 0 0 0 0 0
0 0 1 2 3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ideally, I want to have a vector say [1,2,3.. N] which can be rotated circularly about the highest value in the array (N) centered about some point xc,yc in the grid. Or if this isn't possible, take an base array [1 1 1, 1 2 2, 1 2 3] and rotate it such that 3 is in the centre and you fill a circle as in the 2nd matrix above.
EDIT:
I found rot90(M,k) rotates matrix M k times but this produces:
Mrot = M + rot90(M,1) + rot90(M,2) + rot90(M,3)
Mrot =
1 1 2 1 1
1 2 4 2 1
2 4 12 4 2
1 2 4 2 1
1 1 2 1 1
This stacks it in the x,y directions which isn't correct.
Assuming the corner you want to replicate is symmetric about the diagonal (as in your example), then you can do this in one indexing step. Given a matrix M containing your sample 5-by-5 matrix, here's how to do it:
>> index = [1 2 3 2 1];
>> M = M(index, index)
M =
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1

Resources