I am not an SQL expert and have the following challenge ahead of me. I have a table which contains a note field next to the name of the person concerned. The note field is a free text, which can contain the name of a person. I would like to make it anonymous.
An example for better understanding: Table "Reports"
ID | PersonID | Name | Notefield
1 | 978 | Max | Max isn't feeling so good today.
2 | 234 | Julia | Julia's blood sugar has improved.
3 | ...
The result should look like this:
ID | PersonID | Name | Notefield
1 | 978 | Max | M. isn't feeling so good today.
2 | 234 | Julia | J. blood sugar has improved.
3 | ...
So I want to change the note field depending on the name. Can anyone here help?
You can use replace
REPLACE(Notefield, Name, LEFT(Name,1) + '.' )
This can be dangerous as it will replace parts of normal words. E.g. "Maximize". You must search for REPLACE(Notefield, Name + '[space]', LEFT(Name,1) + '.' )
Related
I have a question related to a kind of duplication I see in databases from time to time. To ask this question, I need to set the stage a bit:
Let's say I have a database of TV shows. Its primary table Content stores information at various levels of granularity (Show -> Season -> Episode), using a parent column to denote hierarchy:
+----+---------------------------+-------------+----------+
| ID | ContentName | ContentType | ParentId |
+----+---------------------------+-------------+----------+
| 1 | Friends | Show | [null] |
| 2 | Season 1 | Season | 1 |
| 3 | The Pilot | Episode | 2 |
| 4 | The One with the Sonogram | Episode | 2 |
+----+---------------------------+-------------+----------+
Maybe this isn't ideal, but let's say it's good enough to work with and we're not looking to change it.
Now let's say we need to build a table that defines air dates. We can set these at any level, and they must apply down the hierarchy (e.g., if set at the Season level, it applies to all episodes within that season; if set at the Show level, it applies to all seasons and episodes).
So the original air dates might look like this:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 71 | 3 | 1994-09-22 |
| 72 | 4 | 1994-09-29 |
+-------+-----------+------------+
Whereas the air date for a streaming service might look like:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
+-------+-----------+------------+
Cool. Everything's fine so far; we're adhering to 4NF (I think!) and we can proceed to our business logic.
Now we get to my question. If we implement our business logic in such a way that disregards the referential hierarchy, and instead duplicates the air dates down the hierarchy, what is this anti-pattern called? e.g., Let's say I set an air date at the Show level like above, but the business logic finds all child elements and creates an entry for each one, resulting in:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
| 92 | 2 | 2015-01-01 |
| 93 | 3 | 2015-01-01 |
| 94 | 4 | 2015-01-01 |
+-------+-----------+------------+
There are some pretty clear problems with this, but please note that my question is not how to fix this. Just, is there a specific term for it? I want to call it something like, "disregarding data relationship" or, "ignoring referential context". Maybe it's not strictly a database anti-pattern, since in my example there's an external actor inserting the excess rows.
I have a person table which keeps some personal info. like as table below.
+----+------+----------+----------+--------+
| ID | name | motherID | fatherID | sex |
+----+------+----------+----------+--------+
| 1 | A | NULL | NULL | male |
| 2 | B | NULL | NULL | female |
| 3 | C | 1 | 2 | male |
| 4 | X | NULL | NULL | male |
| 5 | Y | NULL | NULL | female |
| 6 | Z | 5 | 4 | female |
| 7 | T | NULL | NULL | female |
+----+------+----------+----------+--------+
Also I keep marriage relationships between people. Like:
+-----------+--------+
| HusbandID | WifeID |
+-----------+--------+
| 1 | 2 |
| 4 | 5 |
| 1 | 5 |
| 3 | 6 |
+-----------+--------+
With these information we can imagine the relationship graph. Like below;
Question is: How can I get all connected people by giving any of them's ID.
For example;
When I give ID=1, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=6, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=7, it should return to me 7.
Please attention : Person nodes' relationships (edges) may have loop anywhere of graph. Example above shows small part of my data. I mean; person and marriage table may consist thousands of rows and we do not know where loops may occur.
Smilar questions asked in :
PostgreSQL SQL query for traversing an entire undirected graph and returning all edges found
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=118319
But I can't code the working SQL. Thanks in advance. I am using SQL Server.
From SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'Person %s (%i) has mother %s (%i) and father %s (%i).', person.userName, person.personId, mother.userName, mother.personId, father.userName, father.personId ) msg
FROM dbo.persons person, dbo.relationship hasMother, dbo.persons mother, dbo.relationship hasFather, dbo.persons father
WHERE hasMother.relationshipType = 'mother'
AND hasFather.relationshipType = 'father'
AND MATCH ( father-(hasFather)->person<-(hasMother)-mother );
My results:
Full script available here.
For your specific questions, the current release does not include transitive closure (the ability to loop through the graph n number of times) or polymorphism (find any node in the graph) and answering these queries may involve loops, recursive CTEs or temp tables. I have attempted this in my sample script and it works for your sample data but it's just an example - I'm not 100% it will work with other sample data.
Let's say I have a report with following table in the body:
ID | Name | FirstName | GivenCity | RecordedCity | IsCorrectCity
1 | Gates | Bill | London | New York | No
2 | McCain | John | Brussels | Brussels | Yes
3 | Bullock | Lili | London | London | Yes
4 | Bravo | Johnny | Paris | Las Vegas | No
The column IsCorrectCity Basically includes an expression that checkes GivenCity and RecordedCity and returns a No if different or a Yes when equal.
Is it possible to add a report filter on the column IsCorrectCity (and how) so the users will be able to just select all records with No or Yes? I know this can be done with a parameter in the SQL query, but I would like to add it based on the expressions rather then adding more calculations and all to the query.
Here's a tutorial which explains how you can do it
Filtering Data Without Changing Dataset [SSRS]
As a database user, I'm having problems interpreting the data in one of our tables at work. When I questioned the data team, the solution architects told me it was done this way on purpose because it is a "Type 6" table.
From my limited googling, I think a Type 6 should look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 1/1/2001 | 6/8/2004 | 6/9/2004 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 6/9/2004 | 4/11/2016 | 4/12/2016 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 4/12/2016 | 4/3/2017 | 4/4/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
The activity in question is the Customer_Attrib1, how it changed from 1 to 2 on 5/18/2017.
I like this style because I can figure out what customer_attrib1 is at any point of time by using the start and end dates:
select customer_attrib1
from table
where customer_key=123
and '2017-03-01' between start_date and end_date;
However...
The table itself actually gets updated in arrears, to look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 1/1/2001 | 6/8/2004 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 6/9/2004 | 4/11/2016 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 4/12/2016 | 4/3/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
Can you see how much trouble I have, if I want to go find what the customer_attrib1 was during March of 2016?
NOTE: There is a previous_customer_attrib1 column, but it also gets mass updated to the value of 1. I wanted to keep the table small enough to get the point across, which is why I didn't add it above.
The big question: Is this a valid warehousing strategy? Is this really what Type 6 is? Or is my solution architect wrong.
Follow up question: Would the answer be different if customer_attrib1 was a foreign key to another table?
Your first example looks like a plain ol Type 2 SCD. The second example looks like it is Type 1 (overwrite) on attribute 1 and Type 2 SCD on attribute 2.
Neither are a Type 6 as presented, which would be where you have a way to see both the history of the changes (in a Type 2 way, as per your first example) but also the current values, typically by holding a separate set of columns for the current values, or by linking to the current record. You mention the previous attrib 1 column, which is crucial to it being a Type 6. However you would not expect that to also be bulk updated, as otherwise you just get the one previous value, and you don't get to see any changes prior to that.
Different people refer to a Type 6 meaning different things. What you need in a type 6 is simply the value itself (which applies to the row at the time) and the current value (which is bulk updated when there is a change), along with (of course) the type 2 approach of creating new rows for each change.
To answer your question, yes, I can see how much trouble you have with the design that's been given to you. Its a valid strategy if and only if it meets the business requirements. These techniques are only there to help meet business needs.
If the attribute is a foreign key then it becomes a bit more tricky, and we'd need more info about how that foreign keyed table was tracking history to be able to answer whether that changes anything.
Hallo StackOverflow Users
I am struggling with transferring values between Access database tables which I will use in a Delphi program to tally election votes and determine the winning candidates. I have a total of six tables. One is my overall table, tblCandidates which identifies each candidate and contains the amount of votes they received from each party, namely the Grade Heads, the Teachers and the Learners. When it comes to the Learners we have four participating grades, namely the grade 8’s, 9’s, 10’s and 11’s, and each grade again has multiple participating classes, namely class A, B, C, etc.
Now, I have set up tables for each grade that contains all the classes in that grade. I named these tables tblGrX with X being the grade represented by 8 through 11. Each one of these tables has two extra fields, namely a field to identify a candidate and a field that will add up all the votes that candidate received from each of the classes in that grade. Lastly I have another table, tblGrTotals with fields Total_GrX (once again with X being the grade), that will contain all the total votes a candidate received from each grade, adding them up in another field for my tblCandidates table to use in its Total_Learners field.
So in short, I want, for example, tblGrTotals to use the value in the field Total of tblGr8 in its Total_Gr8 field, and then tblCandidates to use the value in field Total of tblGrTotals in its Total_Learners field. Is there any way to keep these values updated between tables like cells are updated in Excel the moment a change is made?
Thank you in advance!
You need to rethink your table design. I guess your background is Excel, and your tables are laid out like you would do in Excel sheets, but a relational database works differently.
Think about the objects you are modelling.
Candidates - that's easy. ID, Name, perhaps additional info that belongs to each candidate. But nothing about votes here.
"Groups that are voting" or Parties. Not so trivial, due to the different types of parties. Still I would put them in one table, with Grade and Class only set for Learners, NULL for Heads and Teachers.
e.g.
+----------+------------+-------+-------+
| Party_ID | Party_Type | Grade | Class |
+----------+------------+-------+-------+
| 1 | Head | | |
| 2 | Teacher | | |
| 3 | Learner | 8 | A |
| 4 | Learner | 8 | B |
| 5 | Learner | 8 | C |
| 6 | Learner | 9 | A |
| 7 | Learner | 9 | B |
| 8 | Learner | 10 | A |
+----------+------------+-------+-------+
Votes: they are a Junction Table between Candidates and Parties.
e.g.
+----------+--------------+-----------+
| Party_ID | Candidate_ID | Num_Votes |
+----------+--------------+-----------+
| 1 | 1 | 5 |
| 1 | 2 | 17 |
| 3 | 1 | 2 |
| 3 | 2 | 6 |
| 3 | 3 | 10 |
+----------+--------------+-----------+
Now: if you want to know the votes of Class 8A:
SELECT Candidate_ID, SUM(Num_Votes)
FROM Parties p INNER JOIN Votes v
ON p.Party_ID = v.Party_ID
WHERE p.Party_Type = 'Learner'
AND p.Grade = 8
AND p.Class = 'A'
GROUP BY Candidate_ID
Or of all Grade 8? Simply omit the p.Class criteria.
For the votes per candidate you join Candidates with Votes.
Edit:
for the votes counting differently, this is an attribute of Party_Type.
We don't have a table for them yet, so create one:
+------------+---------------+
| Party_Type | Multiplicator |
+------------+---------------+
| Head | 4 |
| Teacher | 3 |
| Learner | 1 |
+------------+---------------+
and to count all votes:
SELECT c.Candidate_ID, c.Candidate_Name, SUM(v.Num_Votes * t.Multiplicator) AS SumVotes
FROM Parties p
INNER JOIN Votes v ON p.Party_ID = v.Party_ID
INNER JOIN Party_Types t ON p.Party_Type = t.Party_Type
INNER JOIN Candidates c ON v.Candidate_ID = c.Candidate_ID
GROUP BY c.Candidate_ID, c.Candidate_Name
With a design like this, you don't need to keep updating data from one table into another - you calculate it when and how you need it, and it's always current.
The magic of databases. :)