Hierarchy in SQL - sql-server

We have a sql database at work with a table, employees that has a column, report_to, which contains the username of the person that that employee reports to. What we want to do is change this representation to a numerical representation. For instance:
'a' reports to 'b' reports to 'c'. So the representation would be something like 'a' = 49, 'b' = 50, 'c' = 51. if 'd' becomes 'c''s boss, then 'd' = 52. If 'a' becomes the supervisor of interns 'e' and 'f', then 'e' and 'f' both are equal to 48.
As shown, starting the numbers at a non zero number allows for expansion not only upwards but also down the hierarchical chain.
The main question is, how do I convert from the current structure ("report_to"), to a numerical representation?
NOTE: this is in MSSQL

You can add a new column (rank) that should be 0.
Then the first step is to find the BIG BOSS - this should be the user who doesn't have a boss - report_to is null. His rank will be 1.
The second step is to find his first directs. They will rank as 2. Something like:
UPDATE TABLE SET RANK = 2
WHERE report_to IN
(SELECT username FROM TABLE WHERE RANK = 1)
The third step is to find directs's directs. Something like:
UPDATE TABLE SET RANK = 3
WHERE report_to IN
(SELECT username FROM TABLE WHERE RANK = 2)
The next steps are identical with step 2 and 3, until no RANK = 0 is found.
All these steps can be done in a procedure, within a WHILE statement.
In the end, if you would like to start the ranking from 50 instead on 1, then you can make an update:
UPDATE TABLE SET RANK = 50 - RANK
or to be sure you don't miss anything:
UPDATE TABLE SET RANK = (SELECT MAX(RANK) FROM TABLE) + 1 - RANK

If you have a field that contains the supervisor of the employee in the table, you can use a recusive CTE to get the hierarchy. Looks that up in Books ONline and get back to us if you have any qquestions.

wow... so do you have a users table? if not, then suggestion 1 is to create one.
users_table
------------
username
user_id
name_first
name_last
other_stuff_?
then populate that with all the existing usernames - possibly by querying the table you are describing for unique names. the user_id will be populated as a sequenced id value during this step.
then you can add a new table,
user_user
-----------
user_id_1
user_id_2
relationship
begin_dt
end_dt
then you can populate this new table with each user to user relationship and when it was valid. e.g. user 48 was related to user 50 beginning on someday with relationship = 'Manages'
the relationship should be probably a fk to yet another table... but i leave that to you as an excercise.

In my opinion, you don't need to use numeric counters, just use positions, because you can't have unlimited position, its gonna stop somewhere. Each username should have a position, like intern, supervisor, employer, project manager or whatever. When you change lets say interns position higher then supervisor's it becomes employer, or something similar. You get the idea. :)

Related

Filtering SQL rows based on certain alphabets combination

I have a column that store user input text field from a frontend website. User can input any kind of text in it, but they will also put in a specific alphabets combination to represent a job type - for example 'dri'. As an example:
Row 1: P49384; Open vehicle bonnet-BO-dri 22/10
Row 2: P93818; Vehicle exhaust-BO 10/20
Row 3: P1933; battery dri-pu-103/2
Row 4: P3193; screwdriver-pu 423
Row 5: X939; seats bo
Row 6: P9381-vehicle-pu-bo dri
In this case, I will like to filter only rows that contain dri. From the example, you can see the text can be in any order (user behaviour, they will key whatever they like without following any kind of format). But the constant is that for a particular job type, they will put in dri.
I know that I can simply use LIKE in SQL Server to get these rows. Unfortunately, row 4 is included inside when I use this operator. This is because screwdriver contains dri.
Is there any way in SQL Server I can do to strictly only obtain rows that has dri job type, while excluding words like screwdriver?
I tried to use PATINDEX but it failed too - PATINDEX('%[d][r][i]%', column) > 0
Thanks in advance.
Your data is the problem here. Unfortunately even for denormalised data it doesn't appear to have a reliable/defined format, making parsing your data in a language like T-SQL next to impossible. What problems are there? Based on the original sample data, at a glance the following problems exist:
The first data value's delimiter isn't consistent. Rows 1-5 use a semicolon (;), but row 6 uses a hyphen (-)
The last data value's delimiter isn't consistent. Row 1, 2 & 4 use a space ( ), but row 3 uses a hyphen (-).
Internal data doesn't use a consistent delimiter. For example:
Row 1 has a the value Open vehicle bonnet-BO-dri, which appears to be the values Open vehicle bonnet, BO and dri; so the hyphen(-) is the delimiter.
Row 5 has seats bo, which appears to be the values seats and bo, so uses a space ( ) as a delimiter.
The fact that row 6 has vehicle as its own value (vehicle-pu-bo-dri), however, implies that Open vehicle bonnet and Vehicle Exhaust (on rows 1 and 2 respectively) could actually be the values Open, vehicle, & bonnet and Vehicle & Exhaust respectively.
Honestly, the solution is to fix your design. As such, your tables should likely look something like this:
CREATE TABLE dbo.Job (JobID varchar(6) CONSTRAINT PK_JobID PRIMARY KEY NONCLUSTERED, --NONCLUSTERED Because it's not always ascending
YourNumericalLikeValue varchar(5) NULL); --Obviously use a better name
CREATE TABLE dbo.JobTypeCompleted(JobTypeID int IDENTITY (1,1) CONSTRAINT PK_JobTypeID PRIMARY KEY CLUSTERED,
JobID varchar(6) NOT NULL CONSTRAINT FK_JobType_Job FOREIGN KEY REFERENCES dbo.Job (JobID),
JobType varchar(30) NOT NULL); --Must likely this'll actually be a foreign key to an actual job type table
GO
Then, for a couple of your rows, the data would be inserted like so:
INSERT INTO dbo.Job (JobID, YourNumericalLikeValue)
VALUES('P49384','22/10'),
('P9381',NULL);
GO
INSERT INTO dbo.JobTypeCompleted(JobID,JobType)
VALUES('P49384','Open vehicle bonnet'),
('P49384','BO'),
('P49384','dri'),
('P9381','vehicle'),
('P9381','pu'),
('P9381','bo'),
('P9381','dri');
Then you can easily get the jobs you want with a simple query:
SELECT J.JobID,
J.YourNumericalLikeValue
FROM dbo.Job J
WHERE EXISTS (SELECT 1
FROM dbo.JobTypeCompleted JTC
WHERE JTC.JobID = J.JobID
AND JTC.JobType = 'dri');
You can apply like operator in your query as column_name like '%-dri'. It means find out records that end with "-dri"

SQL create new unique random ID

I'm creating a Inventory application which uses an SQL database to keep track of products.
The ProductNumber is in the format yyyy-xxxx (i.e. 8024-1234), where the first 4 digits describe a category and the last 4 digits describe the an increasing integer, together creating the productnumber.
When creating a new product, the category should first be approved by an administrator, and therefor all new products will be added as 9999-xxxx. Then later, when the product is approved in the category, it's product number will change to the correct ProductNumber.
What I need for this is when creating a new product, to generate a random number for the last 4 digits, and then check if they don't exist already in the database (together with the first 4 digits). So, when creating a new product, some SQL query should create for example 9999-0123 and then double check if this one doesn't exist already.
How could one achieve this?
Thanks in advance!
you didn't precise the SGBD you are using, but here is a potential solution using Oracle PL/SQL:
declare
temp varchar2(10);
any_rows_found number;
row_exist boolean := true;
begin
WHILE row_exist = true
LOOP
temp := '9999-' || ceil(DBMS_RANDOM.value(low => 999, high => 9999));
select count(*) into any_rows_found from my_table where my_column = temp;
if any_rows_found = 1 then
else
row_exist := false;
insert into my_table values (..................., temp);
end if;
end loop;
end;
we use DBMS_RANDOM to generate the random value , concatenate it to 9999- and then check if it exists we loop to generate another value, if it doesn't exist we insert the value.
regards
You can generate your product number with a sequence, if you'd like an incremental number:
CREATE SEQUENCE product_number
START WITH 1000
INCREMENT BY 1
NOCACHE
NOCYCLE;
Whenever you insert or update a new product and need a valid number just call (sequence.nextVal). Then in your product table set (year, product_number) as a primary key (or the product number itself). If you can't set the primary key as said and want to check if the item already exist with the serial number you can generate the sequence number using:
SELECT sequence.nextVal FROM DUAL;
Then check if the product with the generated number exists.
Didn't know what dialect of SQL you are using, this is Oracle SQL but it can appliead in other dialects too.
Also not sure about the target DB - but worked it out for MS-SQL.
In the first step I would not reccommend the approach of generating a random number first and then check if this one exist and potentially doing this over and over again.
Instead you could go by and get the current max productnumber and work from there on. Even with a varchar you will retrieve the max int - since your syntax is always - (c = category / p = product). In addition to that you will get your desired value straight away since the target category is "9999".
You could work with something like this:
DECLARE #newID int;
-- REPLACE to remove the hyphen so we are facing an actual integer
-- Cast to be able to calculate with the value i.E. adding 1 on top of it
-- MAX for retrieving the max value
SELECT #newID = MAX(CAST(replace(ProductNumber,'-','') as int)) + 1 from Test
-- Set the ID by default to 99990000 in case there are no values with the 9999-prefix
IF #newID < 99990000
BEGIN
SET #newID = 99990000
END
-- Push back the hyphen into the new ID given you the final new productNumber
-- 5 is the starting index
-- 0 since no chars from the original ID shall be removed
Select STUFF(#newID, 5,0,'-')
So in case you currently have a product with 9999-1423 as your product with the highest number this would return "9999-1424".
If there are no products with the prefix of "9999" you would simply get "9999-0000".
The ProductNumber is in the format yyyy-xxxx (i.e. 8024-1234), where the first 4 digits describe a category and the last 4 digits describe the an increasing integer, together creating the productnumber.
We will implement this with a calculated column with puts together the category and the product number which will be in their own individual fields.
When creating a new product, the category should first be approved by an administrator, and therefor all new products will be added as 9999-xxxx. Then later, when the product is approved in the category, it's product number will change to the correct ProductNumber.
Put simply, by default every new product is automatically assigned product category 9999
What I need for this is when creating a new product, to generate a random number for the last 4 digits, and then check if they don't exist already in the database (together with the first 4 digits). So, when creating a new product, some SQL query should create for example 9999-0123 and then double check if this one doesn't exist already.
This can be implemented as an identity. This is not random, but I assume that is not really a requirement right?
Keep in mind there are many holes in these requirements.
If your product number changes from 9999-1234 to 8024-1234 but, has already appeared on reports / documents as 9999-1234, that's a problem
This format only supports at most 1,000 products. Then your system breaks
Again, does the number really need to be random?
I won't go into the actual mechanism for approval and assignment, you'll need to ask that in another question once this one is solved.
ProductNumber is in fact not a number, it's a code, so I don't agree with that column name
On to the code.
Create a table by running this:
CREATE TABLE dbo.Products
(
ProductID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ProductName VARCHAR(100),
ProductCategoryID INT NOT NULL DEFAULT (9999),
ProductNumber AS (FORMAT(ProductCategoryID,'0000') + '-' + FORMAT(ProductID,'0000'))
)
Some explanation of the columns:
ProductID will autogenerate an incrementing number, starting at 1, incrementing by 1 each time. It's guaranteed to be unique. It's also defined as the primary key
ProductCategoryID will default to 9999 if you don't specify anything for it
ProductNumber is the special value you were after calculated from two individual columns
Now create a new product and see what happens
INSERT INTO dbo.Products(ProductName)
VALUES ('Brown Shoes')
SELECT * FROM dbo.Products
You can see Product Number 9999-0001
Add some more and note that the product code increments. It is not random. Carefully consider if you actually really need this to be random.
Now set the actual product category:
UPDATE dbo.Products
SET ProductCategoryID = 7 WHERE ProductID = 1
SELECT * FROM dbo.Products
and note that the product number updates.
Important to note that the real product id is actually just ProductID. The ProductCode column is just something to satisfy your requirements.

How to log or notify when a column is truncated using a LEFT()

As part of our OLAP modeling workflow, we are often truncating fields as upstream data sources have no restrictions or defined data types. A field which should be a 10 character string can sometimes be 50 or 100 characters long if it is a free form user input. I've been told this can cause problems with downstream processes which involve uploads to external sources.
I've been asked to find a way to identify instances in which one ore more of these fields is truncated.
How we handle these fields now is something like this:
SELECT
LEFT(FreeResponseField, 10) AS Comment
INTO
dbo.ModeledTable
FROM
dbo.SourceTable
Essentially if the field is greater than 10 characters, who cares, we only take the first 10.
If dbo.SourceTable.FreeResponseField has a length greater than 10, now we want to know somehow (be it a warning/error message or insertion into a log table). We have a lot of tables with a lot of fields, so the above example is a simplification. Identifying just the field in which this occurs and/or the tuple in the table would be helpful to see where these issues are occurring.
Is something like this possible? You can't just compare data types of the source table with the target table as the source table sets everything to essentially VARCHAR(MAX). The naive approach is to check the length every single value of every tuple against the defined length of the target table.
The original specifications weren't descriptive, but I've figured out a solution and thought I'd share in case anyone stumbles across this for some reason.
Imagine we have a SourceTable which are pulling in to our model. We have defined zip codes as being of length 5 and addresses of being length 25. Say we have the following two records:
CustomerID | ZipCode | Address
1 | 90210 | 123 Fake Street
2 | 902106 | 546 Fake Street
Based on our model definitions, there is an error with ZipCode for the record where CustomerID equals 2. We would like to identify both ZipCode as being the problem field and the record where CustomerID equals 2. The following query with a CROSS APPLY does that:
WITH CTE AS (
SELECT
CustomerID,
ZipCodeFlag = IIF(LEN(ZipCode) > 5, 1, 0),
AddressFlag = IIF(Len(Address) > 25, 1, 0),
ZipCode,
Address
FROM
SourceTable
)
SELECT
CustomerID,
TruncatedField,
RawValue
FROM
CTE
CROSS APPLY (
VALUES ('ZipCode', ZipCodeFlag, ZipCode),
('Address', AddressFlag, Address)
) CA(TruncatedField, TruncatedFlag, RawValue)
WHERE
TruncatedFlag = 1
ORDER BY
CustomerID
With the following output:
CustomerID | TruncatedField | RawValue
2 | ZipCode | 902106

SQL Server Insert Random

i have this table:
Create Table Person
(
Consecutive Integer Identity(1,1),
Identification Varchar(15) Primary Key,
)
The Identification column can contain letters, numbers, and is optional, i.e., the customer can enter it or not, if not, creates a number automatic.. how can i do to insert a random number that does not exist before?, preferably a lower number.
A example could be:
Select Random From Person Where Random Not Exists In Identification
This is my code:
Select Min(Convert(Integer,Identification)) - 1
From Person
Where IsNumeric(Identification) = 1
Or
Select Max(Convert(Integer,Identification)) + 1
From Person
Where IsNumeric(Identification) = 1
Works well, but if the customer enter a number high, for example 1000, or higher, then the number will begin from there could have an overflow error
But if there is not a number below Identification and greater than 0 then well be -1, -2, -3.. etc.
Thanks in advance..
I agree with what M.Ali said. But you can just make use of the below code, but still I don't recommend beyond what M.Ali said.
The loop with continue until a random number is generated which is not in your table. You can change the precision to 5 digits by changing 1000 to 10000 and so on.
DECLARE #I INT = 0
DECLARE #RANDOM INT;
WHILE(#I=0)
BEGIN
SELECT #RANDOM = 1000 + (CONVERT(INT, CRYPT_GEN_RANDOM(3)) % 1000);
IF NOT EXISTS(SELECT Identification FROM YOURTABLE WHERE Identification = CAST(#RANDOM AS VARCHAR(4)))
BEGIN
-- Do your stuff here
BREAK;
END
ELSE
BEGIN
-- The ELSE part
END
END
Maintaining a Random number of VARCHAR(15), which depends on end user's input can be a very expensive approach when you also want it to be unique.
Imagine a scenario when you have some decent amount of rows say 10,000 rows in this table and a user comes in trying to insert a Random number, chances are the user maybe try 5, 10 or even maybe 15 times to get a unique random value.
On each failed attempt a call will be made to server, a search will be done on table (more rows more expensive this query will become), and the more (failed) attempts a user makes more disappointment/poor application experience user will have.
Would you ever go back to an application(web/windwos) where just for registeration you had struggle this many time? obviously not.
The moral of the story is if you are asking a user to enter some random value, do not expect users to maintain your database integrity and keep that column unique, take control and pair that value with another column which will definately be random. In your case it can be the Identity column. Or alternately you can generate that value for user yourself, using guid.
select count(*) +1 from Person
This generates a logical ID for Identification that sets the ID to what it 'would have been' with a simple incrementor.
However, you then cannot delete records; instead you must deactivate them, or clear the row.
Alternately, have a separate (hidden) column that only auto-increments, and if Identification is left empty, use the value from the hidden column. Same result, but less risk if deletion is relevant.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources