Remove duplicate values based on timestamp - sql-server

I would need your help with and SQL query that has to remove duplicate entries from a table, mostly using the datestamp column as a criteria in two passes.
Microsoft SQL DBMS is in question.
Here is a little more details:
Terminology: Module is basically a group of single machine workplaces onto which users operate.
Table:
ModNam column is fixed, there are 15 modules from M A01 to M A15, then goes the B row M B01 ... M B15 and so on until row F.
Pos column is irrelevant at the moment.
MdCod column represents a code of the machine being added to the position in the certain module. It can be replaced by another machine at any given time.
I have one query that will be inserting data into this table by copying entries from another table, every time a new machine is added to one of the positions.
Tricky part for me is a second query that should be comparing records in two phases and if:
1) Inside same module (first pass of the query represented with red color in the example pic attached):
ModNam value is the same, MdCod matches between the entries then the most recent datestamp decides the single one to stay and others duplicates get deleted
2) Inside other module (second pass of the query represented with purple color in the example pic attached):
ModNam values are different and MdCod matches between the entries then the most recent datestamp decides the single one to stay and others duplicates get deleted.
Please help and advise.
Example pic (updated):
Thank you all in advance.

Related

Anylogic: How to create plot from database table?

In my Anylogic model I succesfully create plots of datasets that count the number of trucks arriving from terminals each hour in my simulation. Now, I want to add the actual/"observed" number of trucks arriving at a terminal, to compare my simulation to these numbers. I added these numbers in a database table (see picture below). Is there a simple way of adding this data to the plot?
I tried it by creating a variable that reads the database table for every hour and adding that to a dataset (like can be seen in the pictures below), but this did not work unfortunately (the plot was empty).
Maybe simply delete the variable and fill the dataset at the start of the model by looping through the dbase table data. Use the dbase query wizard to create a for-loop. Something like this should work:
int numEntries = (int) selectFrom(observed_arrivals).count();
DataSet myDataSet = new DataSet(numEntries);
List<Tuple> rows = selectFrom(observed_arrivals).list();
for (Tuple
row : rows) {
myDataSet.add(row.get( observed_arrivals.hour ), row.get( observed_arrivals.terminal_a ));
}
myChart.addDataSet(myDataSet);
You don't explain why it "didn't work" (what errors/problems did you get?), nor where you defined these elements.
(1) Since you want both observed (empirical) and simulated arrivals per terminal, datasets for each should be in the Terminal agent. And then the replicated plot (in Main) can have two data entries referring to data sets terminals(index).observedArrivals and terminals(index).simulatedArrivals or whatever you name them.
(2) Using getHourOfDay to add to the observed dataset is wrong because that just returns 0-23 (i.e., the hour in the current day for the current model date). Your database table looks like it has hours since model start, so you just want time(HOUR) to get the model time in elapsed hours (irrespective of what the model time unit is). Or possibly time(HOUR) - 1 if you only want to update the empirical arrivals for the hour at the end of that hour (i.e., at the same time that you updated the simulated arrivals).
(3) Using a Variable to get the database value each hour doesn't work because a variable's initial value is only evaluated once at model initialisation. You want an hourly cyclic Event in Terminal instead which adds the relevant row's value. (You need to use the Insert Database Query wizard to generate the relevant Java code for the query you need in the event's action.)
(4) Because you have a database table with specifically-named columns for each terminal (columns terminal_a and presumably terminal_b etc.) that makes it slightly more awkward. (This isn't proper relational table design where, instead of 4 columns for the 4 terminals, you'd instead have two columns for terminal_id and observed_value with a row for each time period and terminal combination.)
So your database query expression (in your Terminal agents) will need to use the SQL format (not the QueryDSL format) so that you can 'stitch in' the correct column name into the SQL.

Long calculation times with XLOOKUP vs INDEX-MIN-COLUMN

I'm using this formula =IF(B24="","",IFERROR(INDEX(Sheet3!$C$3:$EE$3,,MIN(IF(Sheet3!$C$4:$EE$23=(Sheet2!C24&$K$18),COLUMN(Sheet3!$C:$EE)))-2),"NF")) to return a cell value in the top row of an array - a date in this case.
The search criteria is a combination of a unique project number and a 2 digit status alphanumerical code for the project. The array consists of 23 rows where combinations of the unique numbers are found, each with different status codes.
So essentially, I'm building a FILTERED project status dashboard that returns dates linked to the relevant project status.
The code above is inspired from ( LINK ) that uses a very similar layout, but it uses town suburbs linked to postal codes instead of project numbers and status codes. The formula works well (though, not entered as an array formula), but I don't have a single formula in the sheet, I have 3 300 occurrences of this formula.
The problem comes in when the user changes the FILTER - Excel recalculates the entire dashboard and that takes anywhere from 2 to 5 minutes to run. You hit the escape button and cancel the calculation after setting the filter, but Excel just starts calculating again after a few seconds. After that, Excel's response is sluggish and almost unusable. Yes - our hardware is pretty weak ...
I tried XLOOKUP as well, but can't set the "lookup_array" to an array ( Sheet3!$C$4:$EE$23 ) because it doesn't match the "return-array" ( Sheet3!$C$3:$EE$3 ) Concatenating the lookup arrays with & works, but then you'd have to do that for all 23 rows, and again, multiply that by 3 300.
I thought of creating a UDF, but the function will still be called every time Excel recalculates after filtering... 3 300 calls ...
Any ideas on how to make the INDEX version run faster, or make the XLOOKUP accept the lookup_array as Sheet3!$C$4:$EE$23 in the hopes that it'll run faster?
Thank you!
Not really an elegant solution, but it works.
I imported the dataset into a helper sheet, where I combined the cell value with the corresponding value in Column A for each row ( a name in this case ) and the date from row 1 for each column, using underscore as a delimiter.
This new data range was then given a unique name, EE in this case.
On a second helper sheet, using this formula =INDEX(Filtered,1+INT((ROW('Sheet1'!C3)-1)/COLUMNS(Filtered)),MOD(ROW('Sheet1'!C3)-1+COLUMNS(Filtered),COLUMNS(Filtered))+1) and drag it down till it returns an REF! error and going back one row before the error.
This transposes all the data into a single column G. Using =UNIQUE(SORT(FILTER(B3:B3240,B3:B3240<> "",""))) then gives me a filtered list of unique values in column H that I then run
=IF(H3="","",LEFT(H3, SEARCH("_",H3,1)-1)) for the first data value in I, and
=IF(H3="","",MID(H3, SEARCH("_",H3) + 1, SEARCH("_",H3,SEARCH("_",H3)+1) - SEARCH("_",H3) - 1)) for the middle data value in J, and
=IF(H3="","",IFERROR(TEXT(RIGHT(H3,5),"yyyy-mm-dd"),"NF")) for the last data value in K.
Then just run XLOOPUP across columns I, J and K.
Runs quick and easy and solves a few of the other issue I had as well.
The second data set has just over 35 000 rows - still works well and fast.

How to stop Heap Analytics grouping assets into "OTHER" Category

I think this might be very simple.
I wrote a query in heap to tell me which users were part of an event and how many times they engaged in it during the year.
The result is a simple table with username and number of occurrences.
It worked. However, Heap has this weird behavior of choosing multiple results (maybe at random?) and throwing them into a single "Other (X other results)" category. Where x is a number of others.
So i end up with a table of 20 maybe 30 users and occurences, and one row of "Other (X other results)".
I shrunk the query to see results from a smaller subset of dates and the "Other" category disappeared.
I really need to see every individual row in my query results! Even if it's paginated.
Help! Thank you
You can export the result as a CSV. The downloaded file will contain all the results (all single entries without the grouped OTHER).
Inte the current UI, you can find Export to CSV at the top of the report view.

How to count unique occurrences with criteria in excel

I'm using the below array formula to count the unique occurrences of text in column C using the agent name in column G as the reference. This is giving me multiple issues.
=SUM( --(FREQUENCY(IF(G3:G100000 = J5,MATCH(C3:C100000,C3:C100000,0)),ROW(C3:C100000) - ROW(C3) + 1) > 0))
Depending on the data set I'm using multiple agents will return a #N/A result and I can't figure out why.
Each dataset I'm using is 20k to 30k lines, so the formulas take a long time to process.
Any ideas how I could do this faster or better? Also any ideas why some agents get bad returns?
I am assuming that you are looking for the number of unique combinations of columns C and G.
Create a pivot table and check the box to add this data to the data model.
Drag both column headers to the Rows section, also drag one (of those same two) into the the values section.
click on the the field in the values section > value field settings > summarize values by > choose Distinct Count. This removes all duplicates.
Click the Row Labels filter and uncheck the blanks.
You can drop in new data then right-click on the pivot and refresh to see the new results. See the image.

Vlookup from multiple criteria to display nearest answer

I was hoping someone can help me. I have hit a solid wall.
I have a table with product information included and I am building a calculator which should spit out a number of options based on set criteria which is in the table. I am failing at just pulling through a code. I feel rather embarassed asking about how to do a vlookup here. But basically I have a vlookup which depends on multiple criteria and for the calc to cough out the nearest match (if applicable) based on this criteria.
Criteria 1 = Product
Criteria 2 = Type
Criteria 3 = Height
Criteria 4 = Min
I have created a search key in the table to concatenate all of these columns and then done a vlookup, which is =Vlookup(Criteria1 & Criteria2 & Criteria3 & Criteria4, Table Data, Code Required) But this does not appear to be giving me results, it either coughs out an error or the incorrect product. Below is my data and my calc I am hoping to complete. Can someone please help?
Here is an example looking for a closest match on Min. It demonstrates the principle so you can extend.
The closest match formula part is:
MATCH(MIN(ABS(E2:E4-K2)),ABS(E2:E4-K2),0))
Column E for column with Min values in. And K2 for target Min. This is an array formula entered with Ctrl + Shift+Enter. You would adjust the range of E2:E4.
The multiple criteria part is using:
=MATCH(lookup_value_1&lookup_value_2&lookup_value_3, lookup_array_1&lookup_array_2&lookup_array_3, match_type)
Where you are concantenating your parameters and searching for a match of the concatenation of those parameters in the table (you could do this against the key column if the key is made up of the same parameters.)
Overall formula with some test data (using one estimate figure):
=INDEX(F:F,MATCH(K1&K5&J5&INDEX(E2:E4,MATCH(MIN(ABS(E2:E4-K2)),ABS(E2:E4-K2),0)),B:B&C:C&D:D&E:E,0))
Above entered combined formula remember is an array formula so entered with Ctrl+Shift+Enter . You can reduce the ranges from entire columns to only those rows holding data.
Data data:
I am not typing all that out from picture so here is a quick n dirty
I tried with the QHarr's solution but it didn't work with all the rows.
My solution is:
Add a column with:
=IF(E2 < $K$2, E2, 0) and copy for all rows
In L5 create the formula:
{=INDEX(F2:F19,MATCH($K$1&K5&$J$5&INDEX(E2:E19,MATCH(MAX(SI(B2:B19=$K$1,1,0)*IF(C2:C19=K5,1,0)*IF(D2:D19=$J$5,1,0)*G2:G19,0),E2:E19,0)),B2:B19&C2:C19&D2:D19&E2:E19,0))}
Copy the formula to L6 and L7
Excel exercise printscreen
Originally marked this as answered and it did work initially but as I added more products it began to fail. I did manage to (after much trial and error) find a simple solution {=INDEX(Calc!$I$2:$I$189,MATCH(Output!$H$7,IF(Calc!$B$2:$B$189=Output!A12,Calc!$H$2:$H$189),1))}

Resources