Generate a new dataset from two existings datasets with conditions

Generate a new dataset from two existings datasets with conditions - arrays

I have two dataset with the same columns, and I would like to create a new one in another sheet with all rows from the first dataset and add to it specific rows from the second one.
My first dataset is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
And the second one is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 3 | 1 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
So now, I would like in a new sheet to retrieve the rows from the first dataset and add at the end the rows from the second one who are absent.
If a Combination of "Item Type" and "Item Numb" is already imported I don't want to get them from the second dataset, but if this specific combination isn't in the first one so I would like to add the row.
That's what I need as the result:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
Thanks in advance for your time folks!

try:
=INDEX(ARRAY_CONSTRAIN(QUERY(SORTN(
{Sheet1!A2:D, Sheet1!A2:A&Sheet1!B2:B;
Sheet2!A2:D, Sheet2!A2:A&Sheet2!B2:B}, 9^9, 2, 5, 1),
"where Col1 is not null", 0), 9^9, 4)

Related

Best way of storing enumerated fields with ability to change order Postgres

What is the best way for storing enumerated fields with ability to change its order?
Lets say my database looks like this:
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 2 | 2nd | 2 |
| 3 | 3rd | 3 |
| 4 | 4th | 4 |
Now, when user change order in such a away
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2 |
| 2 | 2nd | 3 |
| 3 | 3rd | 4 |
Here I would have to update all rows in this table.
I consider 2 solutions
Solution 1)
When inserting row X between for example order 2 and order 3, I would change row's X order field to 3.5, So I would choose number in the middle between adjacent orders.
Above table would look like this
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2.5 |
| 2 | 2nd | 2 |
| 3 | 3rd | 3 |
Then, after for example 16 changes I would update table and normalize all order fields, so table after normalization would be like this:
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2 |
| 2 | 2nd | 3 |
| 3 | 3rd | 4 |
Solution 2)
I also consider adding fields "next" (or "next" and "prev") to each row, but it looks for me like waste of memory.
I really dont want to update whole table every time somebody change order. What is the best way of solving this problem?

Designing a database for categories and subcategories

Basically I'm trying to figure out how Amazon architected their book section. Check out Amazon's book page here (https://www.amazon.com/s/ref=lp_2_ex_n_1?rh=n%3A283155&bbn=283155&ie=UTF8&qid=1522817105).
We are given several main categories: Arts & Photography, Biographies & Memoirs, etc.
If I click on Biographies & Memoirs for example, I'm lead to a series of sub categories. I.E. Biographies & Memoirs > Historical > Asia > Japan
There are repeating sub-category names for example: History > Asia > Japan
How can I map this kind of information so that it is scalable?
Below is the wrong way to do it...?
Categories table
+----+-----------------------+-----------+
| id | name | parent_id |
+----+-----------------------+-----------+
| 1 | Biographies & Memoirs | null |
| 2 | Historical | 1 |
| 3 | Asia | 2 |
| 4 | History | null |
| 5 | Asia | 4 |
| 6 | Japan | 5 |
| 7 | Japan | 3 |
+----+-----------------------+-----------+
Books
+----+-------------------------------------+----------+
| id | name | category |
+----+-------------------------------------+----------+
| 1 | The Lone Samurai | 7 |
| 2 | The Human Tradition in Modern Japan | 7 |
| 3 | Okinawa: The Last Battle | 6 |
+----+-------------------------------------+----------+
Authors
+----+---------------+----------+
| id | firstname | lastname |
+----+---------------+----------+
| 1 | James M. | Burns |
| 2 | Roy E. | Appleman |
| 3 | Russell A. | Gugeler |
| 4 | John | Stevens |
| 5 | William Scott | Wilson |
| 6 | Anne | Walthall |
+----+---------------+----------+
Authors to books (Many to many)
+---------+-----------+
| book_id | author_id |
+---------+-----------+
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
+---------+-----------+

Multiple outcomes/scenarios

I got a problem that I have already created a solution for, but I'm wondering if there's a better way of solving the problem. Basically I have to create a flag for certain scenarios under a partition of ID and date. My solution involved mapping for all the possible scenarios, then creating "case when" statements for all these scenarios with the specific outcome. Basically, I was the one that created the outcomes. I am wondering if there's another way, something around letting SQL create the outcomes instead of myself.
Thanks a lot!
Background:
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
| ID | Month | Status | Value | Flag | Scenario Number | Scenario Description |
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
| 1 | 1/01/2016 | First | 123 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/01/2016 | Second | 456 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/01/2016 | | 789 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/02/2016 | Second | 123 | Yes | 2 | First does not exist, two second but have different values. Flag these as Yes |
| 1 | 1/02/2016 | Second | 456 | Yes | 2 | First does not exist, two second but have different values. Flag these as Yes |
| 1 | 1/02/2016 | Second | 123 | No | 3 | First does not exist, two second have same values. Do not flag |
| 1 | 1/02/2016 | Second | 123 | No | 3 | First does not exist, two second have same values. Do not flag |
| 1 | 1/03/2016 | Second | 123 | No | 4 | Only one entry of Second exist. Do no flag |
| 1 | 1/04/2016 | | 123 | Yes | 5 | Two blanks for the partition. Flag these as Yes |
| 1 | 1/04/2016 | | 123 | Yes | 5 | Two blanks for the partition. Flag these as Yes |
| 1 | 1/05/2016 | | | No | 6 | Only one entry of blank exist. Do not flag these |
| 1 | 1/06/2016 | First | 123 | Yes | 7 | First exist for the partition. Do not flag |
| 1 | 1/06/2016 | | 456 | Yes | 7 | First exist for the partition. Do not flag |
| 1 | 1/07/2016 | Second | 123 | Yes | 8 | First does not exist and second and blank do not have the same value. Flag these. |
| 1 | 1/07/2016 | | 456 | Yes | 8 | First does not exist and second and blank do not have the same value. Flag these. |
| 1 | 1/07/2016 | Second | 123 | Yes | 8 | First does not exist and second and blank have the same value. Flag these. |
| 1 | 1/07/2016 | | 123 | Yes | 8 | First does not exist and second and blank have the same value. Flag these. |
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
Data:
+----+-----------+-------+----------+---------------+
| ID | Month | Value | Priority | Expected_Flag |
+----+-----------+-------+----------+---------------+
| 1 | 1/01/2016 | 96.01 | | Yes |
| 1 | 1/01/2016 | 96.01 | | Yes |
| 1 | 1/02/2016 | 65.2 | First | No |
| 1 | 1/02/2016 | 3.47 | Second | No |
| 1 | 1/02/2016 | 45.99 | | No |
| 11 | 1/01/2016 | 25 | | No |
| 11 | 1/02/2016 | 74.25 | Second | No |
| 11 | 1/02/2016 | 74.25 | Second | No |
| 11 | 1/02/2016 | 23.25 | | No |
| 24 | 1/01/2016 | 1.25 | First | No |
| 24 | 1/01/2016 | 1.365 | | No |
| 24 | 1/04/2016 | 1.365 | First | No |
| 24 | 1/04/2016 | 1.365 | | No |
| 24 | 1/05/2016 | 1.365 | First | No |
| 24 | 1/05/2016 | 1.365 | First | No |
| 24 | 1/06/2016 | 1.365 | Second | No |
| 24 | 1/06/2016 | 1.365 | Second | No |
| 24 | 1/07/2016 | 1.365 | Second | Yes |
| 24 | 1/07/2016 | 1.365 | | Yes |
| 24 | 1/08/2016 | 1.365 | First | No |
| 24 | 1/08/2016 | 1.365 | | No |
| 24 | 1/09/2016 | 1.365 | Second | No |
| 24 | 1/09/2016 | 1.365 | | No |
| 27 | 1/01/2016 | 0 | Second | Yes |
| 27 | 1/01/2016 | 0 | Second | Yes |
| 27 | 1/02/2016 | 45.25 | Second | No |
| 3 | 1/01/2016 | 96.01 | First | No |
| 3 | 1/01/2016 | 96.01 | First | No |
| 3 | 1/03/2016 | 96.01 | First | No |
| 3 | 1/03/2016 | 96.01 | First | No |
| 35 | 1/01/2016 | | | Yes |
| 35 | 1/01/2016 | | | Yes |
| 35 | 1/02/2016 | | First | No |
| 35 | 1/02/2016 | | Second | No |
| 35 | 1/02/2016 | | | No |
| 35 | 1/02/2016 | | | No |
| 35 | 1/03/2016 | | Second | Yes |
| 35 | 1/03/2016 | | Second | Yes |
| 35 | 1/04/2016 | | Second | No |
| 35 | 1/04/2016 | | Second | No |
+----+-----------+-------+----------+---------------+

Stripping out dates, of several formats, from strings

I have a column of strings, called "MyStrings" like the following:
...
Foo bar Jul15 blah blah.xlsx
Choo bar Jul-15 blah far.xlsx
Star bar 10-Jul-15 blah far.xlsx
Car Star bar 10.Jul.2015 blah far.xlsx
...
...
I'd like to do string manipulation so all dates, whatever format, are not included in the results.
So the following query:
SELECT results = <manipulated "MyStrings">
FROM aTable
Should have these results:
...
Foo bar blah blah.xlsx
Choo bar blah far.xlsx
Star bar blah far.xlsx
Car Star bar blah far.xlsx
...
...
Is there a quick way of doing this or do I need to consider each format individually?

You need a Split function
If you split first by <space> is easy create regular expresion for
monDD
mon-DD
DD-mon-YY
DD-mon-YYYY
SQL Fiddle Demo
WITH splitCTE AS (
SELECT s.[id], f.Number, f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[test], ' ') as f
)
SELECT *,
CASE
WHEN item Like 'Jul[0-9][0-9]' THEN 'mmmdd'
WHEN item Like 'Jul-[0-9][0-9]' THEN 'mmm-dd'
WHEN item Like '[0-9][0-9]-Jul-[0-9][0-9]' THEN 'dd-mmm-yy'
WHEN item Like '[0-9][0-9].Jul.[0-9][0-9][0-9][0-9]' THEN 'dd.mmm.yyyy'
ELSE ''
END matchType
FROM splitCTE
OUTPUT
Need a join with list of 3 char months to replace the wired Jul.
Easy expand to also include a version with full month name.
Will match Jul77 as mmmdd but is a start.
You can calculate a IsValidDate column in another step
For some of the format you can use CONVERT to check for a valid date
For other like Jul77 you can separate first 3 char with last 2 and try to get a date.
.
| id | Number | Item | matchType |
|----|--------|-------------|-------------|
| 1 | 1 | Foo | |
| 1 | 2 | bar | |
| 1 | 3 | Jul15 | mmmdd |
| 1 | 4 | blah | |
| 1 | 5 | blah.xlsx | |
| 2 | 1 | Choo | |
| 2 | 2 | bar | |
| 2 | 3 | Jul-15 | mmm-dd |
| 2 | 4 | blah | |
| 2 | 5 | far.xlsx | |
| 3 | 1 | Star | |
| 3 | 2 | bar | |
| 3 | 3 | 10-Jul-15 | dd-mmm-yy |
| 3 | 4 | blah | |
| 3 | 5 | far.xlsx | |
| 4 | 1 | Car | |
| 4 | 2 | Star | |
| 4 | 3 | bar | |
| 4 | 4 | 10.Jul.2015 | dd.mmm.yyyy |
| 4 | 5 | blah | |
| 4 | 6 | far.xlsx | |
Then use your favorite XML PATH to join back without the matching elements

What's the idiomatic way to split a Smalltalk array at the spot where a series of values changes?

Given an array of domain objects (with the properties subject, trial and run) like this:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
i want to split it into multiple arrays at every point where the value for subject changes.
The above example should result in three arrays:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
+---------+-------+-----+
+---------+-------+-----+
| 2 | 1 | 1 |
| 2 | 2 | 1 |
+---------+-------+-----+
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
What would be the idiomatic Smalltalk (Pharo) way to split the array like this?

SequenceableCollection >> piecesCutWhere: which takes a binary block is your friend:
{ 1. 1. 2. 2. 2. 3. 1. 2. } piecesCutWhere: [:left :right | left ~= right]
=> an OrderedCollection #(1 1) #(2 2 2) #(3) #(1) #(2)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Generate a new dataset from two existings datasets with conditions - arrays

try: =INDEX(ARRAY_CONSTRAIN(QUERY(SORTN( {Sheet1!A2:D, Sheet1!A2:A&Sheet1!B2:B; Sheet2!A2:D, Sheet2!A2:A&Sheet2!B2:B}, 9^9, 2, 5, 1), "where Col1 is not null", 0), 9^9, 4)

Related

Best way of storing enumerated fields with ability to change order Postgres

Designing a database for categories and subcategories

Multiple outcomes/scenarios

Stripping out dates, of several formats, from strings

What's the idiomatic way to split a Smalltalk array at the spot where a series of values changes?

Categories

Resources