Distinct values from a field in vespa - vespa

I'm using vespa to view some data. Consider the following data
id product brand
1 a b1
2 b b1
3 c b1
4 d b2
5 e b3
I tried grouping to display the data from brand field. I had a field with price and I wrote a query like this
SELECT * FROM s_data where default contains "soap" | all(group(brand) each(output(sum(price))));
Basically, I don't want to calculate the sum of price, all I want is distinct values from the field 'brand'. Is there a way to do that in vespa?

all(group(brand) each(output(count())))
Gives you all the unique values of the brand field attribute along with their occurrences count. If you really don't need the count you can ignore it in the output.

Related

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

Counting rows in a table based on multiple array criterias

I am trying to count rows in a table based on multiple criteria in different columns of that table. The criteria are not directly in the formula though; they are arrays which I would like to refer to (and not list them in the formula directly).
Range table example:
Group Status
a 1
b 4
b 6
c 4
a 6
d 5
d 4
a 2
b 2
d 3
b 2
c 1
c 2
c 1
a 4
b 3
Criteria/arrays example:
group
a
b
status
1
2
3
I am able to do this if i only have one array search through a range (1 column) in that table:
{=SUM(COUNTIFS(data[Group],group[group]))}
Returns "9" as expected (=sum of rows in the group column of the data table which match any values in group[group])
But if I add a second different range and a different array I get an incorrect result:
{=SUM(COUNTIFS(data[Group],group[group], data[Status],status[status]))}
Returns "3" but should return "5" (=sum of rows which have 1, 2 or 3 in the status column and a or b in the group column)
I searched and googled for various ideas related to using sumproduct or defining arrays within the formula instead of classifying the whole formula as an array but I was not able to get expected results via those means.
Thank you for your help.
Because your group and status criteria are a different number of values (2 values for group, but 3 values for status), I'm not sure you can do this in a single formula. Best way I know of to do this would be to use a helper column (which can be hidden if preferred).
Put this array formula in a helper column and copy down the length of your data (array formulas must be confirmed with Ctrl+Shift+Enter):
=AND(OR(data[#Group]=group[group]),OR(data[#Status]=status[status]))
And then get the count with: =COUNTIF(helpercolumn,TRUE)
You could use a slightly different approach, using Power Query / Power Pivot.
Name your tables Data, Group and Status, then create the following query, named Filtered Data:
let
tbData = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
tbGroup = Excel.CurrentWorkbook(){[Name="Group"]}[Content],
tbStatus = Excel.CurrentWorkbook(){[Name="Status"]}[Content],
#"Merged Group" = Table.NestedJoin(tbData,{"Group"},tbGroup,{"Group"},"tbGroup",JoinKind.Inner),
#"Merged Status" = Table.NestedJoin(#"Merged Group",{"Status"},tbStatus,{"Status"},"Merged Status",JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Status",{"tbGroup", "Merged Status"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Status", type number}})
in
#"Changed Type"
Load To as connection only, and tick Load to Data Model
Now create a DAX measure:
Status Sum:=SUM ( 'Filtered Data'[Status] )
You can then use the following formula on your worksheet, to get the Sum of Status values, for rows matching the criteria specified in the Group and Status tables:
=CUBEVALUE("ThisWorkbookDataModel","[Measures].[Status Sum]")
Simply refresh the data connection to update the value.

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

Array formula using multiplication and division across 3 columns

I have Inventory data that is in the following format:
Column D | Column E | Column F
Pack Qty | Pack Price | Total Qty
This is followed by multiple rows with various numerical values, with the odd blank row.
To calculate the stock value of any particular product/line, I use =F2/D2*E2.
To calculate the total value of stock I tried {=Sum(F:F/D:D*E:E)} but it returns a #Div/0! error.
As mentioned, some rows are blank. Some items have 0 price, others have 0 stock on hand.
I would like to avoid having to total each line in a new column then total that column.
Try this:
{=SUM(IFERROR(F:F/D:D*E:E,0))}
You can simply wrap your division inside IFERROR() and return 0.
{=SUM(IFERROR(F:F/D:D,0)*E:E)}

Linq - Limit list to 1 row per unique values based on value (minimum) of single field

I have a stored procedure (I cannot edit) that I am calling via linq.
The stored procedure returns values (more complex but important data below):
Customer Stock Item Date Price Priority Qty
--------------------------------------------------------
CUST1 TAP 01-04-2012 £30 30 1 - 30
CUST1 TAP 05-04-2012 £33 30 1 - 30
CUST1 TAP 01-04-2012 £29 20 31 - 99
CUST1 TAP 01-04-2012 £28 10 1 - 30
I am trying to limit this list to rows which have unique Dates and unique quantities in LINQ.
I want to remove items with the HIGHER priority leaving rows with unique dates and qty's.
I have tried several group by's using Max and order by's but have not been able to get a result.
Is there any way to do this via linq?
EDIT:
Managed to convert brad-rem's answer into VB.net.
Syntax below if anyone needs it:
returnlist = (From p In returnlist
Order By p.Qty Ascending, p.Priority
Group By AllGrp = p.Date, p.Qty Into g = Group
Select g.First).ToList
How about the following. It groups by Date and Qty and orders it so that the lower priorities come first. Then, it just selects the first item from each group, which are all the lower priority items:
var result = from d in dbData
orderby d.Priority
group d by new
{
d.Date,
d.Qty
} into group1
select group1.First();

Resources