Database structure, query vs iteration - database

(Basic question that probably is a duplicate, but I don't know what to search for, so feel free to edit this question with the proper database terms)
How would one retrieve groups of rows (family members) from a database table based on columns (last and first names) efficiently? E.g. from this
last first ...
doe john ...
doe jane ...
smith jimmy ...
smith ted ...
smith anna ...
to something like this (additional data omitted)
doe : [{first:john, ...}, {jane}],
smith: [{jimmy}, {ted}, {anna}]
Does this require retrieving the common data (last name) with distinctor group by first and then iterating with additional queries (where last="smith") for each name?
I'd think that that naive approach likely is inefficient and there are better solutions.

You need GROUP_CONCAT() aggregate function:
SELECT last, GROUP_CONCAT(first) first
FROM tablename
GROUP BY last
or JSON_GROUP_ARRAY():
SELECT last, JSON_GROUP_ARRAY(first) first
FROM tablename
GROUP BY last
See the demo.

Related

Add columns to a set of results depending how many rows found

I don't have sample datathat fits the example below, and it's more a theoretical question rather than a data-driven one...
I have a table called CustomerOrders. A query looks to see if any customers haven't ordered anything for more than 4 days (again, it's just an example but easier than explaining the real purpose).
If there are such customers, then the query searches an Communications table that records whether or not sales staff have noted that it's been four days or more since an order was received from that customer, and what action they're taking to address this.
Depending on the number of days since the last order, and the number of times sales staff have logged their acknowledgement (ideally it should be every day until they place an order), each customer appears in the results like this:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate, SalesComment
At present, each entry sales staff log a comment about this date gap appears as a separate row in this result set, each essentially repeating themselves, other than the last two columns.
What I would prefer is for this result set to be set out as:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate[1], SalesComment[1], SalesStaffCommentDate[2], SalesComment[2]
etc, with the number of additional comment and date columns showing the comments made, but all on one row.
But if the sales team only logged two comments on one customer, but ten comments on another, there is obviously a disparity between the number of columns that could be filled.
Is it possible to display the data in this way?
EDIT - thanks to #Larnu and #Smor so far.
To try and give a bit more data. This is how my data looks:
NAME LASTORDERDATE NUMDAYSSINCE SALESSTAFFCOMMENTDATE SALESCOMMENT
John Smith 2022-06-12 5 2022-06-15 Tried to call
John Smith 2022-06-12 5 2022-06-16 Call back later
John Smith 2022-06-12 5 2022-06-17 Not required
I want it to look like this:
John Smith 2022-06-12 5 2022-06-15 Tried to call 2022-06-16 Call back later 2022-06-17 Not required.
There may be anything from 1 - 10 entries before the customer orders again and reset the counter back to being < 4 days since their last.
#larnu, are you saying that the link you give allows me to present the data in this way? Ordinarily I would export this data to PBI and pivot it to display as I need it to, but for this bit of data I'm unable to do that, and so it needs to be in SQL.
Hope that clarifies things in case I was being a bit too vague.

Counting common visited countries

This is a simplified post from another question.
Consider this :
How many visited countries in common does John and Mary have? Same question for John and alfred ? Same question for Alfred and Mary ?
Here is a google sheet to play : https://docs.google.com/spreadsheets/d/1jWAXVGt2_E3fYo8WZSBP1Fp-vg3gYPKlG2ZxC-4SE34/edit?usp=sharing
Try this:
=ArrayFormula(sum(countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))*countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))))
As far as I can see there are four correct answers to this question depending how you pose the question, rather like in SQL:
(1) for every instance of person 1 with a country, how many instances of person 2 are there with the same country including duplicates (like a cross join)
(2) for every unique combination of person 1 with a country, how many instances of person 2 with the same country are there (like a left join)
(3) for every unique combination of person 2 with a country, how many instances of person 1 with the same country are there (like a right join)
(4) for each unique combination of person 1 with a country, is there at least one instance of person 2 with the same country (like an inner join)
I have gone for option (1).
The other three formulas should be
=ArrayFormula(sum((countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))>0)*countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))))
=ArrayFormula(sum(countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))*(countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))>0)))
=ArrayFormula(sum((countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))>0)*(countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))>0)))

Vlookup array multiple columns

Excel wiz's,
I'm trying to build a report with a simple drop down list of names. Rather than try to explain in more detail, let me give you a sample dataset:
Table1:
Text Person1 Person2 Person3
String here contains name(s) Mike Smith Robert Johnson Suzy Q
Another string with name(s) Dan Boy John Michael Bob Wise
Different string with name(s) Robert Johnson Suzy Q
In my report sheet, I have a drop down list of all the possible "persons" that I want to chose from and then return all values from the "Text" column in an array. I have been able to make it work with only one column using this formula, where C4 contains my choice in the dropdown list:
INDEX(Table1[#All],SMALL(IF(Table1[Person1]=$C$4,ROW(Table1[Person1])),ROW(1:1)),1)
The text column will contain all the names of the Person columns, but they are in a different case (all caps, can't change format for display purposes). Maybe a SEARCH function would be more useful? I'm not sure. I'm trying to avoid using a macro, but I am not completely opposed.
Let me know what you guys think, and thanks in advance!
Simply re-organize your table so that there's one row per name... the V-Lookup on the name and get the matching list.
Person Text
Mike Smith String with names
Robert Johnson String with names
Suzy Q String with names
Dan Boy Second string with names
are you trying to make validations for teams? like select team, then next drop down gives only members of that team?
you can use offset inside validation. in one cell put a validation for the list of teams. in the other cell, create a list validation, use a offset formula to return the range of members based on the selected team.
edit: not sure how to put in a table, but this is how you would fill a range with vlookup
in the table with the entries, add a column with serial number starting from 1-n
just below the drop down box, enter numbers 1 to n in order
vlookup the serial number in the table, that is the row you are looking up
for the column, use a match to look in the table which column the current selected person is
drag the formula down to fill n numbers

How to find the minimum fields needed to identify a unique row in a set of data

Say I have a bunch of data on some people. This could include Name, DOB, Address, Email, etc... Assume there are no unique identifiers (like an id column) on this data, but also assume that there are no repeating rows. I need to figure out the minimum set of fields I can use to query that data and return a unique row.
An example of a solution would be: "I can make a query that specifies a first name, dob, email, and zip, and that would return exactly one or zero rows."
Did I ask that in a way that makes sense? I am looking for a technique, algorithm, or software package that would solve this problem for a given set of data. Anything that could provide an answer would work. Thanks!
EXAMPLE DATA (the real stuff is much more complex):
FNAME LNAME DOB ZIP email
John Smith 1/1/12 77777 dude#fake.com
Sean Smith 1/2/08 77777 dude#fake.com
Sean William 4/2/07 77789 stuff#fake.com
Richard Ross 1/1/12 78989 foo#fake.com
The solution for this set of data would be (FNAME, LNAME) or (EMAIL, DOB) or (EMIAL, FNAME).
i think you will need an iterative approach.
perhaps you can begin with each column, and attempt to create a unique index.
if you have success, then done.
if you are unable to create unique index then add another column and try again.
do this for all columns until you can successfully make the index.

What is the most effective way to handle lots of tables in a database?

I am new to database programming and am using sqlite and python. As an example lets say I have a database named Animals.db which I open with and get the cursor for in python. Now if I wanted to separate the animals by species I would have a different table per species and since it can get even more specific I would likely need something more specific than just a table of species.
I am a bit confused on how one allocates the correct data to the correct area of a database, how is it separated. Are there tables of tables?
if I wanted to lets say have a table for every land animal and another for every animal of the sea, but each table would need further specification(homo sapiens, etc), how can I do that?
Now if I wanted to separate the
animals by species I would have a
different table per species
Maybe. Maybe not. You might use a table that looked like this. It depends entirely on what you mean by "separate the animals by species". Here's one reasonable interpretation.
Animal_name Sex Species
------
Jack M Leopardus pardalis
Susie F Leopardus pardalis
Kimmie M Leopardus pardalis
Susie F Stenella clymene
Ginger F Stenella clymene
Mary Ann F Stenella clymene
To find all the Clymene dolphins, you might use a query along these lines.
select Animal_name
from animals
where species = 'Stenella clymene'
order by Animal_name
Animal_name
--
Ginger
Mary Ann
Susie
Start by collecting data. Your goal is to collect a set of representative sample data. Sample data, because the full population is too big to handle. Representative, because ideally it represents all the problems you're likely to run into with the full population. If "animal name" to you doesn't mean "Jack" or "Ginger", but "ocelot" and "Clymene dolphin", representative sample data will make that clear.

Resources