How should I represent a unique super-admin privilege in a database? - database

My project needs to have a number of administrators, out of which only one will have super-admin privileges.
What is the best way to represent this in the database?

There are a few ways to do this.
Number 1: Have a column on your administrator (or user) table called IsSuperAdmin and have an insert/update trigger to ensure that only one has it set at any given time.
Number 2: Have a TimestampWhenMadeSuperAdmin column in your table. Then, in your query to figure out who it is, use something like:
select user_id from users
where TimestampWhenMadeSuperAdmin is not null
order by TimestampWhenMadeSuperAdmin desc
fetch first 1 row only;
Number 3/4: Put the SuperAdmin user ID into a separate table, using either the trigger or last-person-made-has-the-power approach from numbers 1 or 2.
Personally, I like number 2 since it gives you what you need without unnecessary triggers, and there's an audit trail as to who had the power at any given time (though not a complete audit trail since it will only store the most recent time that someone was made a SuperAdmin).
The trouble with number 1 is what to do if you just clear the current SuperAdmin. Either you have to give the power to someone else, or nobody has it. n other words, you can get yourself into a situation where there is no SuperAdmin. And number 3 and 4 just complicate things with an extra table.

Use a roles/groups approach. You have a table containing all the possible roles, and then you have an intersect table containing the key of the user and the key of the role they belong to (there can be multiple entries per user as each user could have several roles (or belong to several groups)).
Also, don't call them super admin - just admin is fine, call the rest power user or something similar.

Simple, yet effective: UserId = 1. Your application will always know it is the SuperUser.

Related

My own database autoincrement - Avoid duplicated keys

I have a Java EE Web Application and a SQL Server Database.
I intend to cluster my database later.
Now, I have two tables:
- Users
- Places
But I don't want to use auto id of SQL Server.
I want to generate my own id because of the cluster.
So, I've created a new table Parameter. The parameter table has two columns: TableName and LastId. My parameter table stores the last id. When I add a new user, my method addUser do this:
Query the last id of the parameter table and increments +1;
Insert the new User
Update the last id +1.
It's working. But it's a web application, so how about 1000 people simultaneously? Maybe some of them get the same last id. How can I solve this? I've tried with synchronized, but it's not working.
What do you suggest? Yes, I have to avoid auto-increment.
I know that the user has to wait.
Automatic ID may work better in a cluster, but if you want to be database-portable or implement the allocator yourself, the basic approach is to work in an optimistic loop.
I prefer 'Next ID', since it makes the logic cleaner, so I'm going to use that in this example.
SELECT the NextID from your allocator table.
UPDATE NextID SET NextID=NextID+Increment WHERE NextID=the value you read
loop while RowsAffected != 1.
Of course, you'll also use the TableName condition when selecting/ updating to select the appropriate allocator row.
You should also look at allocating in blocks -- Increment=200, say -- and caching them in the appserver. This will give better concurrency & be a lot faster than hitting the DB each time.

Pentaho ETL Table Input Iteration

Context
Im having a table with Customer information. I want to find out the repeat customers in the table based on information like:
First_Name
Last_Name
DOB
Doc_Num
FF_Num
etc.
Now to compare one customer with the rest of the records in the same table, I need to:
read one record at a time
and compare this record with the rest in such a way that if a column does not match
then I need to compare the other columns for the records
Question
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete? This process should continue till all the records in the table are checked/ processed.
Also, would like to know if we can Iterate the same procedure instead of reading one record at a time from Table_Input?
To make your Table Input read and write row by row, doesn't see like the best solution and I don't think it would achieve what you want (e.g. keeping a track of previous records).
You could try using the Unique rows step, that can redirect a duplicate row (using the key you want) to another flow where it will be treated differently (or delete it if you don't want it). From what I can see you'll want to have multiple Unique rows to check each one of the columns.
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete?
Yes it is possible to change the buffer rows in between the steps. You can change the Nr of Rows in rowset to 1. But it is not recommended to change this property unless you run low on memory. This might make the tool behave abnormally.
Now as per the comments shared, i see there are two questions:
1. You need to check the count of duplicate entries:
You can achieve this result either using a Group By step or using the Unique step as answered by astro11. You can get the count of names easily and if the count is greater than 1, you can consider it as duplicate.
2. Checking on the two data rows:
You want to validate two names (for e.g.) like "John S" and "John Smith". Both are names should ideally be considered as a single name, hence a duplicate.
First of all this is a data quality issue and no tool will consider these rows as same. What you can do is to use a step called "Fuzzy match". This step based on the algorithms you choose will try to give you the measure of the closest match of Names. But for achieving this you need to have a seperate MASTER table with all the possible names. You can use "Jaro Winkler" algo to get the closest match.
Hope this helps :)

SQL Server: Match Identity Specification of 2 diff columns in 2 diff tables

I'm making a program that has two different tables(well more, but those are the ones I have an issue with). One called SYNPaymentHistory, and the other one called OTHERSPaymentHistory
They have almost the same columns, except the SYNPaymentHistory includes an "ID" number, for each Syndicate. The Other's table is for any random payment the company receives other than from the Syndicates
I made a page in which a person fills out a payment application, and when all that is done, it should print out a receipt. Receipts have a SerialNb, which is a Column found in both tables (it's an INT column with Identity Specification that increases by 1, on every input).
My issue is that I want the SerialNb to be synchronized between both of them.
Ex: say I just filled out a payment application from a Syndicate, SerialNb on the top of the receipt should say 5001. If I want to fill a payment application from tickets the company earned due to a party, I'd want that receipt to have the SerialNb of 5002.
Is there some way to link 2 columns that are from 2 different tables? I think a WHILE Loop can half-solve the issue, if one of them auto-increases by 1, and the other has a WHILE loop that, if i = SYNHistoryPayment.SerialNb, then i = i + 1 (i being OTHERSHistoryPayment) but it wouldn't work out the other way, because SYNHistoryPayment would end up not caring about OTHERSHistoryPayment's values.
Is it, in any way possible, related to diagrams? I couldn't properly understand the usage of diagrams so I'm hoping that's not the way to work it through.
If you need any additional information, I'd love to Edit in the info needed.
Programs used:
Visual Studio 2015,
SQL Server Management Studio 2014
SQL Server has a feature to address just this issue. It is called a Sequence. You create a sequence and at any time you can sql server to give you the next value in the sequence. Each request is guaranteed to have a unique result in increasing order.
Create a sequence -- when you want to make a row in either table get the next value from the sequence and use that.
In addition to solving your problem using a sequence also solves the problem of multiple instances of an application running at the same time. You don't have to worry -- each instance of an application will get it's own number.
https://msdn.microsoft.com/en-us/library/ff878091.aspx

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

SQL pagination for on-the-fly data

I'm new to pagination, so I'm not sure I fully understand how it works. But here's what I want to do.
Basically, I'm creating a search engine of sorts that generates results from a database (MySQL). These results are merged together algorithmically, and then returned to the user.
My question is this: When the results are merged on the backend, do I need to create a temporary view with the results that is then used by the PHP pagination? Or do I create a table? I don't want a bunch of views and/or tables floating around for each and every query. Also, if I do use temporary tables, when are they destroyed? What if the user hits the "Back" button on his/her browser?
I hope this makes sense. Please ask for clarification if you don't understand. I've provided a little bit more information below.
MORE EXPLANATION: The database contains English words and phrases, each of which is mapped to a concept (Example: "apple" is 0.67 semantically-related to the concept of "cooking"). The user can enter in a bunch of keywords, and find the closest matching concept to each of those keywords. So I am mathematically combining the raw relational scores to find a ranked list of the most semantically-related concepts for the set of words the user enters. So it's not as simple as building a SQL query like "SELECT * FROM words WHERE blah blah..."
It depends on your database engine (i.e. what kind of SQL), but nearly each SQL flavor has support for paginating a query.
For example, MySQL has LIMIT and MS SQL has ROW_NUMBER.
So you build your SQL as usual, and then you just add the database engine-specific pagination stuff and the server automatically returns only, say, row 10 to 20 of the query result.
EDIT:
So the final query (which selects the data that is returned to the user) selects data from some tables (temporary or not), as I expected.
It's a SELECT query, which you can page with LIMIT in MySQL.
Your description sounds to me as if the actual calculation is way more resource-hogging than the final query which returns the results to the user.
So I would do the following:
get the individual results tables for the entered words, and save them in a table in a way that you can get the data for this specifiy query later (for example, with an additional column like SessionID or QueryID). No pagination here.
query these result tables again for the final query that is returned to the user.
Here you can do paging by using LIMIT.
So you have to do the actual calculation (the resource-hogging queries) only once when the user "starts" the query. Then you can return paginated results to the user by just selecting from the already populated results table.
EDIT 2:
I just saw that you accepted my answer, but still, here's more detail about my usage of "temporary" tables.
Of course this is only one possible way to do it. If the expected result is not too large, returning the whole resultset to the client, keeping it in memory and doing the paging client side (as you suggested) is possible as well.
But if we are talking about real huge amounts of data of which the user will only view a few (think Google search results), and/or low bandwidth, then you only want to transfer as little data as possible to the client.
That's what I was thinking about when I wrote this answer.
So: I don't mean a "real" temporary table, I'm talking about a "normal" table used for saving temporary data.
I'm way more proficient in MS SQL than in MySQL, so I don't know much about temp tables in MySQL.
I can tell you how I would do it in MS SQL, but maybe there's a better way to do this in MySQL that I don't know.
When I'd have to page a resource-intensive query, I want do the actual calculation once, save it in a table and then query that table several times from the client (to avoid doing the calculation again for each page).
The problem is: in MS SQL, a temp table only exists in the scope of the query where it is created.
So I can't use a temp table for that because it would be gone when I want to query it the second time.
So I use "real" tables for things like that.
I'm not sure whether I understood your algorithm example correct, so I'll simplify the example a bit. I hope that I can make my point clear anyway:
This is the table (this is probably not valid MySQL, it's just to show the concept):
create table AlgorithmTempTable
(
QueryID guid,
Rank float,
Value float
)
As I said before - it's not literally a "temporary" table, it's actually a real permanent table that is just used for temporary data.
Now the user opens your application, enters his search words and presses the "Search" button.
Then you start your resource-heavy algorithm to calculate the result once, and store it in the table:
insert into AlgorithmTempTable (QueryID, Rank, Value)
select '12345678-9012-3456789', foo, bar
from Whatever
insert into AlgorithmTempTable (QueryID, Rank, Value)
select '12345678-9012-3456789', foo2, bar2
from SomewhereElse
The Guid must be known to the client. Maybe you can use the client's SessionID for that (if he has one and if he can't start more than one query at once...or you generate a new Guid on the client each time the user presses the "Search" button, or whatever).
Now all the calculation is done, and the ranked list of results is saved in the table.
Now you can query the table, filtering by the QueryID:
select Rank, Value
from AlgorithmTempTable
where QueryID = '12345678-9012-3456789'
order by Rank
limit 0, 10
Because of the QueryID, multiple users can do this at the same time without interfering each other's query. If you create a new QueryID for each search, the same user can even run multiple queries at once.
Now there's only one thing left to do: delete the temporary data when it's not needed anymore (only the data! The table is never dropped).
So, if the user closes the query screen:
delete
from AlgorithmTempTable
where QueryID = '12345678-9012-3456789'
This is not ideal in some cases, though. If the application crashes, the data stays in the table forever.
There are several better ways. Which one is the best for you depends on your application. Some possibilities:
You can add a datetime column with the current time as default value, and then run a nightly (or weekly) job that deletes everything older than X
Same as above, but instead of a weekly job you can delete everything older than X every time someone starts a new query
If you have a session per user, you can save the SessionID in an additional column in the table. When the user logs out or the session expires, you can delete everything with that SessionID in the table
Paging results can be very tricky. They way I have done this is as follows. Set an upperbound limit for any query that may be run. For example say 5,000. If a query returns more than 5,000 then limit the results to 5,000.
This is best done using a stored procedure.
Store the results of the query into a temp table.
Select Page X's amount of data from the temp table.
Also return back the current page and total number of pages.

Resources