Take UNIQUE rows value under conditions - arrays

I have a G-spreadsheet where I applied an API connector, that takes data (invoices data) from an external DB; every day the G-sheet receives the last 250 rows Data from the external DB, so some rows are news and others are repeated and added again below in the list.
Usually, I have 2-3 times repeated rows with the same information and sometimes I can have repeated value like: invoiceID - invoice number - ecc., but with a different amount (because the administration system sometimes has to correct the invoices).
My question is How can I create a formula in a new sheet collecting the UNIQUE newest values of the repeated rows?
As you can see the Fetch date is the API connector that for 3 days repeated the value, and on 29 it gots a different amount after correction of the invoice from the admin department.

try SORTN with the 3rd parameter set to 2:
=SORTN(SORT(B2:G; 1; 0); 99^99; 2; 5; 0)

Related

Anylogic: How to create plot from database table?

In my Anylogic model I succesfully create plots of datasets that count the number of trucks arriving from terminals each hour in my simulation. Now, I want to add the actual/"observed" number of trucks arriving at a terminal, to compare my simulation to these numbers. I added these numbers in a database table (see picture below). Is there a simple way of adding this data to the plot?
I tried it by creating a variable that reads the database table for every hour and adding that to a dataset (like can be seen in the pictures below), but this did not work unfortunately (the plot was empty).
Maybe simply delete the variable and fill the dataset at the start of the model by looping through the dbase table data. Use the dbase query wizard to create a for-loop. Something like this should work:
int numEntries = (int) selectFrom(observed_arrivals).count();
DataSet myDataSet = new DataSet(numEntries);
List<Tuple> rows = selectFrom(observed_arrivals).list();
for (Tuple
row : rows) {
myDataSet.add(row.get( observed_arrivals.hour ), row.get( observed_arrivals.terminal_a ));
}
myChart.addDataSet(myDataSet);
You don't explain why it "didn't work" (what errors/problems did you get?), nor where you defined these elements.
(1) Since you want both observed (empirical) and simulated arrivals per terminal, datasets for each should be in the Terminal agent. And then the replicated plot (in Main) can have two data entries referring to data sets terminals(index).observedArrivals and terminals(index).simulatedArrivals or whatever you name them.
(2) Using getHourOfDay to add to the observed dataset is wrong because that just returns 0-23 (i.e., the hour in the current day for the current model date). Your database table looks like it has hours since model start, so you just want time(HOUR) to get the model time in elapsed hours (irrespective of what the model time unit is). Or possibly time(HOUR) - 1 if you only want to update the empirical arrivals for the hour at the end of that hour (i.e., at the same time that you updated the simulated arrivals).
(3) Using a Variable to get the database value each hour doesn't work because a variable's initial value is only evaluated once at model initialisation. You want an hourly cyclic Event in Terminal instead which adds the relevant row's value. (You need to use the Insert Database Query wizard to generate the relevant Java code for the query you need in the event's action.)
(4) Because you have a database table with specifically-named columns for each terminal (columns terminal_a and presumably terminal_b etc.) that makes it slightly more awkward. (This isn't proper relational table design where, instead of 4 columns for the 4 terminals, you'd instead have two columns for terminal_id and observed_value with a row for each time period and terminal combination.)
So your database query expression (in your Terminal agents) will need to use the SQL format (not the QueryDSL format) so that you can 'stitch in' the correct column name into the SQL.

SQL syntax to calculate total menu items per meal orderID across multiple meal orderIDs

I am new to SQL and Stack overflow and have a question about SQL Server syntax. i have searched online but I am not finding what I need and I would appreciate your assistance in this matter.
I have data in a source table for meal orders (each with a specific ID (e.g. 12345C) and items of each order (e.g. sandwich, drink, chips), each with an associated number starting with 1. For instance, the sandwich would have an item number of 1, the chips would be item # 2, and the drink would be item # 3 for the same orderID 12345C. The prior example would therefore have 3 rows of data in the source table for orderID 12345C.
My questions are these:
how do I use a SQL expression to determine the number of items per each order (e.g. 3 for the above example, which is also the maximum value for item number for each orderID)
and then add all of these items per order across hundreds of orders per day to determine the daily total number of items ordered.
So, if I had 3 orders in one day - one with 2 items, the second with 3 items, and the third with 4 items, I would like my final number to be 9.
This number is for use in a Sisense dashboard that allows SQL syntax in the field definition. Thank you for your help!
It is a bit difficult to explain but I am not able to use a query from a table because I am working with a dashboard in Sisense so I am adding fields in a pivot display and one of the fields I would like to include is the total count of order items per day (across several dozen orderIDs).
Here is an example of the data in the table: from the example I would like the final answer for orderID 1787588 to be 3 (there are 3 items within the order).

Google Data Studio date aggregation - average number of daily users over time

This should be simple so I think I am missing it. I have a simple line chart that shows Users per day over 28 days (X axis is date, Y axis is number of users). I am using hard-coded 28 days here just to get it to work.
I want to add a scorecard for average daily users over the 28 day time frame. I tried to use a calculated field AVG(Users) but this shows an error for re-aggregating an aggregated value. Then I tried Users/28, but the result oddly is the value of Users for today. The division seems to be completely ignored.
What is the best way to show average number of daily users over a time frame? Average daily users over 10 days, 20 day, etc.
Try to create a new metric that counts the dates eg
Count of Date = COUNT(Date) or
Count of Date = COUNT_DISTINCT(Date) in case you have duplicated dates
Then create another metric for average users
Users AVG = (Users / Count of Date)
The average depends on the timeframe you have selected. If you are selecting the last 28 days the average is for those 28 days (dates), if you filter 20 days the average is for those 20 days etc.
Hope that helps.
I have been able to do this in an extremely crude and ugly manner using Google Sheets as a means to do the calculation and serve as a data source for Data studio.
This may be useful for other people trying to do the same thing. This assumes you know how to work with GA data in Sheets and are starting with a Report Configuration. There must be a better way.
Example for Average Number of Daily Users over the last 7 days:
Edit the Report Configuration fields:
Report Name: create one report per day, in this case 7 reports. Name them (for example) Users-1 through Users-7. These are your Row 2 values. You'll have 7 columns, with the first report name in column B.
Start Date and End Date: use TODAY()-X where X is the number of days previous to define the start and end dates for each report. Each report will contain the user count for one day. Report Users-1 will use TODAY()-1 for start and end, etc.
Metrics: enter the metrics e.g. ga:users and ga:new users
Create the reports
Use 'Run reports' to have the result sheets created and populated.
Create a sheet for an interim data set you will use as the basis for the average calculation. The first column is date, the remaining columns are for the metrics, in this case Users and New Users.
Populate the interim data set with the dates and values. You will reference the Report Configuration to get the dates, and you will pull the metrics from each of the individual reports. At this stage you have a sheet with date in first columns and values in subsequent columns with a row for each day's values. Be sure to use a header.
Finally, create a sheet that averages the values in the interim data set. This sheet will have a column for each metric, with one value per column. The one value is calculated from the series in the interim data set, for example =AVG(interim_sheet_reference:range) or any other calculation you'd like to do.
At last, you can use Data Studio to connect to this data source and use the values. For counts of users such as this example, you would use Sum as the aggregation field type when you are creating the data source.
It's super ugly but it works.

Getting minimum values out of calculated table

I have:
a table with user names
a table indicating actions with columns for user name, action time, action name. Named events unique_events
I started collecting data on January. I want to have a column in my table of user names which indicates how long it has been since a user first used my application and the first of January.
So if a user first logged in in January, the value of the row with that user's name will be 0. If one logged in on March it will be 2.
I tried:
Column = DATEDIFF(01-01-2016, MIN(SELECTCOLUMNS(FILTER('events unique_events','events unique_events'[User Name] = Users[User Name]),"DatedTime", [DatedTime])),MONTH)
which returns an error saying the Min function needs a column reference.
I also tried the same with FirstDate instead of MIN which returned an error saying FirstDate can't be used with summarize functions.
Any other ideas on how to achieve this, or fix what I tried?
(for simplicity, I will call your table 'Events', and user login dates field 'User_Login_Date').
First, define your app start date as a measure:
App_Start_Date:= DATE(2016, 1, 1)
Then, define measure that finds min differences between Application Start Date and User Login dates:
User_Start_Diff=: MINX(Events, DATEDIFF([App_Start_Date], Events[User_Login_Date], Month))
Drop this measure into a pivot table against user names, and you should have your desired result.
How it works:
1) MINX goes record by record and calculates date differences for each customer login. It then finds minimum in the results;
2) When you drop the measure into a pivot table, it splits MINX results by customer, and recalculates min for each of them separately. You don't need to do the grouping.
Creation of [Start_Date] measure is not technically necessary but a matter of good style - don't hardcode values in your formulas, always create measures. You will thank yourself later when you need to make a change.

Generating Working Hours using SQL Server Query

I have this data and I need to generate a query that will give the output below
You can do this kind of groupings of rows with 2 separate row_number()s. Have 1 for all the data, ordered by date and second one ordered by code and date. To get the groups separated from the data, use the difference between these 2 row_number()s. When it changes, then it's a new block of data. You can then use that number in group by and take the minimum / maximum dates for each of them.
For the final layout you can use pivot or sum + case, most likely you want to have a new row_number for getting the rows aligned properly. Depending if you can have data missing / not matching you'll need probably additional checks.

Resources