Should I create index on created_at column on database [closed] - database

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Maybe it is a dumb question but in my database schema i'm working on time based analyzing on simple posts table.
Is it a good idea to create index on created_at column on postgresql or database doing created_at index already?
In my post table
| post_id | user_id| some_post_stuff... | created_at |
|:--------:|:--------------:|:--------:|:--------------:|
| 1| 1 | hello #world | 05/05/2021 |
| 2| 1| #stackoverflow is best | 05/05/2021 |
In my tags table
| tag_id | tag | trended_at |
|:--------:|:--------------:|:--------:|
| 1 | world | 05/05/2021 |
| 2 | stackoverflow | 05/05/2021 |
In my post_tags table
| tag_id| post_id| created_at |
|:--------:|:--------------:|:--------:|
| 1| 1| 05/05/2021 |
| 2| 2| 05/05/2021 |
I have wrote a function to find trend post with most popular tags. So every 15 minutes. Database cron jobs(nodejs client) function execute group by command on post_tags then found trends tag then selects posts. Should I create created_at index on post_tags.created_at because when I insert this table it updates primary key index file but created_at is important for this condition?
Thanks for your help.

Postgres does not create indexes for you (with the exception of primary keys). If you want to index something you need to add it yourself.
Should you index a column? Depends on how you're querying. Indexes are useful for both searching and ordering. Without indexes the database may have to search the whole table or spend a lot of time sorting the results.
You should index created_at if your queries have to search a lot of rows and order them with order by created_at, or use created_at in a where clause like where created_at between '2015-01-01' and '2015-12-31 23:59:59'.
If your queries return a small number of rows, you might not have to index created_at. For example: select * from posts where user_id = X order by created_at would probably use an index on user_id to filter the rows down to a small set. Sorting them by created_at will be quick, so no index is necessary.

Related

Ensuring that two column values are related in SQL Server

I'm using Microsoft SQL Server 2017 and was curious about how to constrain a specific relationship. I'm having a bit of trouble articulating so I'd prefer to share through an example.
Consider the following hypothetical database.
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+-----------------------------------+
| Id | CustomerId | AddressId |
+-----------------------------------+
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 3 | <--- Invalid Customer/Address Pair
+-----------------------------------+
Notice that the final Order links a customer to an address that isn't theirs. I'm looking for a way to prevent this.
(You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.)
From the related reading I was able to find, it seems that one method may be to enable a CHECK constraint targeting a User-Defined Function. This User-Defined Function would be something like the following:
WHERE EXISTS (SELECT 1 FROM Addresses WHERE Id = Order.AddressId AND CustomerId = Order.CustomerId)
While I imagine this would work, given the somewhat "generality" of the articles I was able to find, I don't feel entirely confident that this is my best option.
An alternative might be to remove the CustomerId column from the Addresses table entirely, and instead add another table with Id, CustomerId, AddressId. The Order would then reference this Id instead. Again, I don't love the idea of having to channel through an auxiliary table to get a Customer or Address.
Is there a cleaner way to do this? Or am I simply going about this all wrong?
Good question, however at the root it seems you are struggling with creating a foreign key constraint to something that is not a foreign key:
Orders.CustomerId -> Addresses.CustomerId
There is no simple built-in way to do this because it is normally not done. In ideal RDBMS practices you should strive to encapsulate data of specific types in their own tables only. In other words, try to avoid redundant data.
In the example case above the address ownership is redundant in both the address table and the orders table, because of this it is requiring additional checks to keep them synchronized. This can easily get out of hand with bigger datasets.
You mentioned:
However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
But that is why a relational database is relational. It does this so that distinct data can be kept distinct and referenced with relative IDs.
I think the best solution would be to simply drop this requirement.
In other words, just go with:
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+--------------------+
| Id | AddressId |
+--------------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 | <--- Valid Order/Address Pair
+--------------------+
With that said, to accomplish your purpose exactly, you do have views available for this kind of thing:
create view CustomerOrders
as
select o.Id OrderId,
a.CustomerId,
o.AddressId
from Orders
join Addresses a on a.Id = o.AddressId
I know this is a pretty trivial use-case for a view but I wanted to put in a plug for it because they are often neglected and come in handy with organizing big data sets. Using WITH SCHEMABINDING they can also be indexed for performance.
You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
If you face performance problems, the first thing is to create or amend proper indexes. And DBMS are usually good at join operations (with proper indexes). But yes normalization can sometimes help in performance tuning. But it should be a last resort. And if that route is taken, one should really know what one is doing and be very careful not to damage more at the end of a day, that one has gained. I have doubts, that you're out of options here and really need to go that path. You're likely barking up the wrong tree. Therefore I recommend you take the "normal", "sane" way and just drop customerid in orders and create proper indexes.
But if you really insist, you can try to make (id, customerid) a key in addresses (with a unique constraint) and then create a foreign key based on that.
ALTER TABLE addresses
ADD UNIQUE (id,
customerid);
ALTER TABLE orders
ADD FOREIGN KEY (addressid,
customerid)
REFERENCES addresses
(id,
customerid);

SQL Server - Multiple Identity Ranges in the Same Column

Yesterday, I was asked the same question by two different people. Their tables have a field that groups records together, like a year or location. Within those groups, they want to have a unique ID that starts at 1 and increments up sequentially. Obviously, you could search for MAX(ID), but if these applications have a lot of traffic, they'd need to lock the entire table to ensure the same ID wasn't returned multiple times. I thought about using sequences but that would mean dynamically creating a sequence for each group.
Example 1:
Records created during the year should increment by one and then restart at 1 at the beginning of the next year.
| Year | ID |
|------|----|
| 2016 | 1 |
| 2016 | 2 |
| 2017 | 1 |
| 2017 | 2 |
| 2017 | 3 |
Example 2:
A company has many locations and they want to generate a unique ID for each customer, combining a the location ID with a incrementing ID.
| Site | ID |
|------|----|
| XYZ | 1 |
| ABC | 1 |
| XYZ | 2 |
| XYZ | 3 |
| DEF | 1 |
| ABC | 2 |
One trick that is often under-used is to create a clustered index on Site / ID or Year / ID - BUT Change the order of the ID column to Desc rather than ASC.
This way when you need to scan the CI to get the Next ID value it only needs to check 1 row in the clustered index. I've used this on Multi-Billion Record tables and it runs quite quickly. You can get even better performance by partitioning the table by Site or Year then you'll get the added benefit of partition elimination when you run your MAX(ID) queries.

SQL Primary Key Decisions

In my scenario I am tracking a population of members and their doctor changes
The columns concerned are
MemberID | Prov_Nbr | Prov_Start_Date | Prov_End_Date | Prov_Update_Date
My question is in regards to a primary key
In this scenario, would it be better to have a primary key on an Auto-Increment field, and add the column to the front like so:
IDENTITY |MemberID | Prov_Nbr | Prov_Start_Date | Prov_End_Date | Prov_Update_Date
Or to create the primary key based on the business rules/uniqueness of the data?
MemberID - PK1 | Prov_Nbr - PK2 | Prov_Start_Date - PK3 | Prov_End_Date | Prov_Update_Date
This is how the data would look in table, after processing on a weekly basis:
MemberID | Prov_Nbr | Prov_Start_Date | Prov_End_Date | Prov_Update_Date
------------------------------------------------------------------------
ABC123| IR456|2014-01-01|null|null - original record
ABC123| IR102|2014-04-01|null|null - new record turns original record `Prov_End_Date` to New `Prov_Start_Date - 1 day`
So table looks like this:
ABC123 | IR456 | 2014-01-01 | 2014-03-31 | null
ABC123 | IR102 | 2014-04-01 | null | 2014-04-30
Still with me?
There are situations where based on the nature of the business a member could have a "retro" which essentially means this:
ABC123 | IR456 | 2014-01-01| 2014-03-31 | null
ABC123 | IR102 | 2014-04-01| null | 2014-04-30
gets a new record
ABC123 | IR402 | 2014-01-01 | null | null
essentially retro-fitting the original record with a new provider.
Would this case ruin the uniqueness of the data? or would SQL know how to handle this as a primary key update?
Any help with this would be much appreciated.
I would actually put both of your solutions into place, as in create an identity field as your primary key (probably clustered) and add a unique key on MemberID, Prov_Nbr, Prov_Start_Date.
The top SQL Server bloggers are almost always extolling the virtues of an identity as PK, including situations somewhat similar to this where it is a surrogate, and you can then additionally enforce your business rule with the UK. Of course, I hope I'm reading your requirements correctly, especially the "retro" part.

Relationships Between Tables in MS Access

I'm new in DataBases at all and have some difficulties with setting relationships between 3 tables in MS Access 2013.
The idea is that I have a table with accounts info, a table with calls related to this accounts and also one table with all the possible call responses. I tried different combinations between them but nothing works.
1st table - Accounts : AccountID(PK) | AccountName | Language | Country | Email
2nd table - Calls : CallID(PK) | Account | Response | Comment | Date
3rd table - Responses: ResponseID(PK) | Response
When you have a table, it usually has a Primary Key field that is the main index of the table. In order for you to connect it with other tables, you usually do that by setting Foreign Key on the other table.
Let's say you have your Accounts table, and it has AccountID field as Primary Key. This field is unique (meaning no duplicate value for this field).
Now, you have the other table called Calls and you have a Foreign Key field called AccountID there, which points to the Accounts table.
Essentially you have Accounts with the following data:
AccountID| AccountName | Language | Country | Email
1 | FirstName | EN | US | some#email.com
2 | SecondName | EN | US | some#email.com
Now you have the other table Calls with Many calls
CallID(PK) | AccountID(FK) | ResponseID(FK) | Comment | Date
1 | 1 | 1 | a comment | 26/10
2 | 1 | 1 | a comment | 26/10
3 | 2 | 3 | a comment | 26/10
4 | 2 | 3 | a comment | 26/10
You can see the One to Many relationship: One accountID (in my example AccountID=1) to Many Calls (in my example 2 rows with AccountID=1 as foreign keys, rows 1 & 2) and AccountID=2 has also 2 rows of Calls (rows 3 and 4)
Same goes for the Responses table
Using this table structure:
Accounts : AccountID(PK) | AccountName | Language | Country | Email
Calls : CallID(PK) | AccountID(FK) | ResponseID(FK) | Comment | Date
Responses: ResponseID(PK) | Response
Accounts.AccountID is referenced by Calls.AccountID. 1:n – many calls for one account possible, but each call concerns just one account.
Responses.ResponseID is referenced by Calls.ResponseID. 1:n – many calls can get the same response from the prepared set, but each call gets exactly one of them.
To actually define the Relationships in Access, open the Relationships window...
... then follow the detailed instructions here:
How to define relationships between tables in an Access database

How to normalize an association between weekly participations and weekly questions?

I am implementing a contest system in which the user has to choose the correct answer of multiple questions. Each week, there is a new set of questions.
I am trying to find the correct way to store the user participations in a database. Right now I have the following data model:
Participation Week
+--------------+ +--------------+
| Id | +----------->| Id |<-+
| UserId | | | StartDate | |
| WeekId |-----+ +--------------+ |
+--------------+ |
Question |
+--------------+ |
| Id | |
| WeekId |--+
| Text |
+--------------+
The only solution I came up with is to add an Answer table that associates a participation with a question, as indicated in the following diagram:
Participation Week
+--------------+ +--------------+
+->| Id | +----------->| Id |<-+
| | UserId | | | StartDate | |
| | WeekId |-----+ +--------------+ |
| +--------------+ |
| Question |
| Answer +--------------+ |
| +------------------+ +---->| Id | |
+------| ParticipationId | | | WeekId |--+
| QuestionId |----+ | Text |
| Value | +--------------+
+------------------+
I don't link this solution is very good, because it allows a participation to have answers to questions from a different week. Adding the WeekId to the answer does not help.
What is the correct way to represent this information?
You could remove the Id field in the table Participation, and use (UserId, WeekId) as composed/concatenated primary key for the table Participation. The field ParticipationId in the table Answer you have to replace then by the pair (UserId, WeekId) as foreign key reference to the table Participation. If your database system allows it, you can define then the fields (QuestionId, WeekId) in the table Answer to reference (Id, WeekId) in the table Question. Maybe for this you have to define an index on the pair (Id, WeekId) in the table Question before.
Do you really need to associate a participation with its week? You can get it through Question
So:
Answer(Id,UserId,QuestionId,Value)
Question(Id,WeekId,Text)
Week(Id, StartDate)
Personally I think you have the proper implementation here.
ParticipationId links to the ID on the participation, which is keyed with the user and the week. Your Question table is keyed with the WeekId as well.
Therefore, you have the proper references all along. If this is not the case I think we will need to see some data

Resources