Matching employees from different school and hometown - database

I am new to coding. Now I have a employee table looked like below:
Name
Hometown
School
Jeff
Illinois
Loyola University Chicago
Alice
California
New York University
William
Michigan
University of Illinois at Chicago
Fiona
California
Loyola University Chicago
Charles
Michigan
New York University
Linda
Indiana
Loyola University Chicago
I am trying to get those employees in pairs where two employees come from different state and different university. Each person can only be in one pair. The expected table should look like
employee1
employee2
Jeff
Alice
William
Fiona
Charles
Linda
The real table is over 3,000 rows. I am trying to do it with SQL or Python, but I don't know where to start.

A straightforward approach is to pick employees one by one and search the table after the one for an appropriate peer; found peers are flagged in order to not be paired repeatedly. Since in your case a peer should be found after a few steps, this iteration will likely be faster than operations which construct whole data sets at once.
from io import StringIO
import pandas as pd
# read example employee table
df = pd.read_table(StringIO("""Name Hometown School
Jeff Illinois Loyola University Chicago
Alice California New York University
William Michigan University of Illinois at Chicago
Fiona California Loyola University Chicago
Charles Michigan New York University
Linda Indiana Loyola University Chicago
"""))
# create expected table; its length is half that of the above
ef = pd.DataFrame(index=pd.RangeIndex(len(df)/2), columns=['employee1', 'employee2'])
k = 0 # number of found pairs, index into expected table
# array of flags for already paired employees
paired = pd.Series(False, pd.RangeIndex(len(df)))
# go through the employee table and collect pairs
for i in range(len(df)):
if paired[i]: continue
for j in range(i+1, len(df)):
if not paired[j] \
and df.iloc[j]['Hometown'] != df.iloc[i]['Hometown'] \
and df.iloc[j]['School'] != df.iloc[i]['School']:
# we found a pair - store it, mark employee j paired
ef.iloc[k] = df.iloc[[i, j]]['Name']
k += 1
paired[j] = True
break
else:
print("no peer for", df.iloc[i]['Name'])
print(ef)
output:
employee1 employee2
0 Jeff Alice
1 William Fiona
2 Charles Linda

Related

AngularJS validating a row as a single compound entity

I've got a structure of authors, readers, and preferences (high, medium, low, or blacklisted). All preferences are medium unless a rule overrides it. Whenever a new book becomes available, one reader should be given the opportunity to read that book. Some readers like particular authors, so the rule for that would look like:
author
reader
preference
Charles Dickens
Joe Bloggs
high
Some authors might be blacklisted for all readers (null):
author
reader
preference
Adolf Hitler
null
blacklisted
But a reader can override that rule:
author
reader
preference
Adolf Hitler
Mary Jones
medium
Some readers mightn't have as much time to read and want to read fewer books by any author (null):
author
reader
preference
null
John Smith
low
But that reader might make time for a particular author:
author
reader
preference
Geoffrey Chaucer
John Smith
high
So, all those rules in combination look like:
author
reader
preference
Charles Dickens
Joe Bloggs
high
Adolf Hitler
null
blacklisted
Adolf Hitler
Mary Jones
medium
null
John Smith
low
Geoffrey Chaucer
John Smith
high
When a new book by Charles Dickens becomes available, Joe Bloggs is more likely to be given the opportunity to read it, and John Smith is less likely, with Mary Jones having an average chance.
When a new book by Geoffrey Chaucer becomes available, Joe Bloggs and Mary Jones both have an average chance to be given the opportunity to read it, and John Smith is more likely.
If a new book by Adolf Hitler becomes available, Joe Bloggs will not be given an opportunity to read it, Mary Jones will have an average chance, and the rules for John Smith are inconsistent (Adolf Hitler blacklisted for any reader and any author low preference) and need to be resolved.
When the rules are being edited, I need a way to validate all the rules against each other. For instance, Joe Bloggs can't say that his preference for Charles Dickens is both high and low. If that happened, I want both rules to be highlighted. And in the above example, the two rules "Adolf Hitler, null, blacklisted" and "null, John Smith, low" should both be highlighted until resolved, for example with another row "Adolf Hitler, John Smith, blacklisted".
However, in AngularJS, I only know how to validate single inputs, not how to treat the entire row as a single input and also to make sure it doesn't clash with any other rows.
e.g.
<textarea ng-model="description" ng-minlength="5" ng-maxlength="255" required="required"></textarea>
How can I validate a compound item i.e. {rules: [{author: 1, reader: 2, preference: "high"}]} against all other items?
<div ng-repeat="rule in rules">
<input ng-model="rule.author">
<input ng-model="rule.reader">
<input ng-model="rule.preference">
</div>
I.e. I want to validate rule as an entire object, not just rule.author, rule.reader, and rule.preference.

Google Sheets - finding first match in array formula

Okay, I found many solution for many problems in Google Sheets, but this one is just hard as a rock. :)
I have a sheet where in column C are various names and in column D are professions. For example:
C / D
John Smith / plumber
Paul Anderson / carpenter
Sarah Palmer / dentist
Jonah Huston / carpenter
Laura Jones / dentist
Sid Field / carpenter
...etc
(as you can see every name are identical, but professions are repeating several times)
I'd like to see in column F the last matching name of the same profession
C / D / F
John Smith / plumber / (N/A)
Paul Anderson / carpenter /Jonah Huston
Sarah Palmer / dentist / Laura Jones
Jonah Huston / carpenter / Sid Field
Laura Jones / dentist / (N/A)
Sid Field / carpenter / (N/A)
...etc
It works fine with INDEX and FILTER function, but I have to copy the code over and over again as I add extra rows. This is the code I use:
=IFERROR(INDEX(FILTER($C4:$C,$D4:$D=D6,$A4:$A<A6),1))
I'm looking for a solution with Array Formula to autofill all cells in column F, and tried various versions (Lookup, Vlookup..etc), but couldn't find the right formula.
Any guidance would be appreciated. :)
try:
=INDEX(IF(C1:C="",,IFNA(VLOOKUP(
D1:D&COUNTIFS(D1:D, D1:D, ROW(D1:D), "<="&ROW(D1:D)), {
D1:D&COUNTIFS(D1:D, D1:D, ROW(D1:D), "<="&ROW(D1:D))-1,
C1:C}, 2, 0), "(N/A)")))

How to SUM part of a UNIQUE ARRAY formula

Hi I can't figure this out for the life of me! On the second tab 'Combined' I'm trying to get the array to show a combination of unique dates, names and then a total of the amount to pay that person.
https://docs.google.com/spreadsheets/d/1sSuHK0h2OeaEJpraoHXTi01XIwpreM47BovQAV-snDE/edit?usp=sharing
So ideally it should show on the 'Combined' page:
A
11/15/2020 Bill Jones $553.80
11/15/2020 Steve Robinson $320.00
10/7/2019 Grady Johnson $100.12
11/15/2020 Grady Johnson $45.00
11/22/2020 Jim Luke $300.43
11/17/2020 Jim Luke $1,357.63
I've been trying to figure this out for days - please help!
use:
=QUERY(Investors!A4:C,
"select A,B,sum(C)
where A is not null
group by A,B
label sum(C)''")

Looking up values within a range of cells

Suppose I have the following data table in Excel
Company Amount Text
Oracle $3,400 330 Richard ERP
Walmart $750 348 Mary ERP
Amazon $6,880 xxxx Loretta ERP
Rexel $865 0000 Mike ERP
Toyota $11,048 330 Richard ERP
I want to go through each item in the "Text" column, search the item against the following range of names:
Mary
Mike
Janine
Susan
Richard
Jerry
Loretta
and return the name in the "Person" column, if found. For example:
Company Amount Text Person
Oracle $3,400 330 Richard ERP Richard
Walmart $750 348 Mary ERP Mary
Amazon $6,880 xxxx Loretta ERP Loretta
Rexel $865 0000 Mike ERP Mike
Toyota $11,048 330 Richard ERP Richard
I've tried the following in Excel which works:
=IF(N2="","",
IF(ISNUMBER(SEARCH(Sheet2!$A$1,N2)),Sheet2!$A$1,
IF(ISNUMBER(SEARCH(Sheet2!$A$2,N2)),Sheet2!$A$2,
IF(ISNUMBER(SEARCH(Sheet2!$A$3,N2)),Sheet2!$A$3,
....
Where $A$1:$A$133 is my range and N2 is the "Text" column values; however, that is a lot of nested code and apparently Excel has a limit on the number of nested IF statements you can have.
Is there a simpler solution (arrays? VBA?)
Thanks!
Use the following formula:
=IFERROR(INDEX(Sheet2!A:A,AGGREGATE(15,6,ROW(Sheet2!$A$1:INDEX(Sheet2!A:A,MATCH("zzz",Sheet2!A:A)))/(ISNUMBER(SEARCH(Sheet2!$A$1:INDEX(Sheet2!A:A,MATCH("zzz",Sheet2!A:A)),N2))),1)),"")

The optimal way to enter multiple addresses for the same record within a form?

So I've been developing a sort of data entry platform within accessing using forms and subforms.
I have a form titled PHYSICIAN. Each physician will have basic data like first/last name, DOB, title, contract dates, etc. The aspect I'm wanting to cover is addresses as they may have multiple, since they may work/practice at 2 or 3 or even 10 different locations.
Instead of having our data entry team key in a full record each time they need to add an address, I'd like a way for the form to retain ALL information not related to the address.
So if Ken Bone works at 7 places, I want to allow them to key all of those addresses a bit more efficiently than creating a new record.
There's one main issue I'm running into --- A subform or autopopulate option doesn't necessarily increment the autonumber ID (primary key) for the record. All of the information is being stored in 1 master table.
Is there a way around this or a more logical approach that you folks might suggest?
I recommend that you have a couple of tables perhaps even three.
tblDoctorInfo
- Dr_ID
- Name
- DOB
- Title
tblAddresses
- AddressID
- Address1
- Address2
- City
- State
- Zip
- Country
tblDr_Sites
- DrSites_ID
- Dr_ID
- AddressID
The tables might have data like this.
tblDoctorInfo
1, Bob Smith, 12/3/1989, Owner
2, Carl Jones, 1/2/1977, CEO
3, Carla Smith, 5/3/1980, ER Surgeon
tblAddresses
1, 123 Elm St, Fridley, MN 55038
2, 234 7th St, Brookdale, MN 55412
3, 345 Parl Ave, Clinton, MN 55132
tblDr_Sites
Then you could associate the tables with the third table. (Note each of the three tables have an ID field that increments).
1,1,1 This record means Dr. Bob works in Fridley
2,1,2 This record means Dr. Bob works in Brookdale
3,3,1 This record means Dr. Carla works in Fridley
4,2,3 This record means Dr. Carl works in Clinton
5,2,2 This record means Dr. Carl works in Brookdale
6,2,1 This record means Dr. Carl works in Fridley

Resources