Store all dates in one table - database

I have many tables in my database and each one has one or two fields which is DATE field. This is increasing my database size so I am thinking to store all DATE fields in one table and add relationship to all tables. Is it possible and a good idea or not?
My database, example:
Old design
tblCustomer = > CustomerID, Surname, Name, DateFirstVisit, DateStopped
tblOrder = > OrderID, CustomerID, DateOrder, Order, DateShiped
tblPayment = > PaymentID, CustomerID, DatePayment, Price, DateCheck
New design
tblCustomer = > CustomerID, Surname, Name, DateInID, DateOutID
tblOrder = > OrderID, CustomerID, DateInID, Order, DateOutID
tblPayment = > PaymentID, CustomerID, DateInID, Price, DateOutID
tblDateIn = > DateInID, DateIn
tblDateOut = > DateOutID, DateOut
Can I combine tblDateIn and tblDateOut?
Thank you...

Technically, yes, you can further normalize your database this way. You could go so far as to have a Dates table that just has every date in it and use those dates by reference to a DateID, but this is over-normalization.
In addition to making simple queries more complicated because you will have to join to the dates table every time, I think you'll find that you don't save that much space and might possibly use more space. I don't know for certain what Access uses, but dates are typically stored internally as decimal values or an integer representing a count of seconds since a starting date. In any case, the space you would save in your tables by having an integer key versus Access' internal date value would be tiny and likely offset by having additional tables and indexes involved in foreign keys.

Related

How to determine id date falls under one of the ranges (ranges are stored in separate rows of another table)

I have SalesFacts Table, which contains Sales_Amount, Customer_ID and Invoice_Date.
In another table I have Information's about special agreements for some of the customers (columns are: Customer_ID, Agreement_Start_Date, Agreement_End_Date).
Now - i would like to check, if the sales from SalesFact table occurred when special agreement was active for the Customer. This would be pretty easy, if there was only one date range when special agreement was active. However, in my case, Table with Special Agreements date ranges contains duplicated Customer ID, because for one Customer there might be several time ranges, where special agreement was active.
E.G. In SalesFact Table I have 3 transactions for one customer:
In SpacialAgreements Table I can see, that there are 2 data ranges when this customer had a right to special agreements.
I would like to create a query, that adds additional column to my SalesFacts table, that would determine, if the transaction happened when there was a Special Agreement Active. So in case shown above, it would be:
If there was Only one date range with special agreement it would be pretty easy:
Select
S.[Sales_Amount], S.[Customer_ID], S.[Invoice_Date],
IIF(S.[Invoice_Date] >= A.[Agreement_Start_Date] and S.[Invoice_Date]<=A.[Agreement_End_Date],'YES','NO') as AGREEMENT
From SalesFacts S left join SpacialAgreements A on S.[Customer_ID] = A.[Customer_ID]
But since there are several date ranges in SpacialAgreement table, i don't know how to achieve that properly, without risking any duplicates in Sales_Amount and without loosing any data.
Any ideas?
If you want to get data exactly as you shown in question then for the SELECT statement you can use something like this:
SELECT
S.[Sales_Amount],
S.[Customer_ID],
S.[Invoice_Date],
CASE WHEN EXISTS (SELECT 1
FROM SpacialAgreements A
WHERE A.Customer_ID = S.Customer_ID
AND S.[Invoice_Date] >= A.[Agreement_Start_Date]
AND S.[Invoice_Date] <= A.[Agreement_End_Date])
THEN 'YES'
ELSE 'NO'
END as Agreement
FROM SalesFacts S
So, this solution can be used if you are selecting data or creating view from this query.
If you want to have persisted value as one physical column in your SalesFacts table then you can try to solve your problem with triggers.

how to get data from two tables of sqlite and sort data

I've two tables
income
expense
the problem is I want to query all the data from both tables
SELECT income.date AS IN_DATE, expense.date AS EX_DATE FROM income, expense
I get weird result data is double times from db as you can see
you can try this out HERE
how can I get distinct results not double and at last wanna ask don't have idea of getting data from both tables and sort by date descending.
My guess is that you want union all:
select 'income' as which, id, title, date
from income
union all
select 'expense' as which, id, title, date
from expense;
This will give you a result set containing the rows from the two tables, with an identifier of which table each row comes from.
You can order by date and do other manipulations if you use a subquery:
select ie.*
from (select 'income' as which, id, title, date
from income
union all
select 'expense' as which, id, title, date
from expense
) ie
order by date desc;
Your simple SELECT does a cross product with the two columns (IN_DATE, EX_DATE). Hence, you get every possible combination of the values from both columns. INNER JOIN income ON expense.id=income.id or WHERE income.id == expense.id should do the trick.
You need to match the same ids, else SQL will just output any possible combination.
SELECT income.date AS IN_DATE, expense.date AS EX_DATE FROM income, expense WHERE income.id LIKE expense.id

Is there a quicker way of doing this type of query (finding inactive accounts)?

I have a very large table of wagering transactions. Let's say for the sake of the question I want to find the accounts of people who have wagered in the last year but not wagered in the last month, so I do something like this...
--query one
select accountnumber into #wageredrecently from activity
where _date >='2011-08-10' and transaction_type = 'Bet'
group by accountnumber
--query two
select accountnumber,firstname,lastname,email,sum(handle)
from activity a, customers c
where a.accountnumber = c.accountno
and transaction_type = 'Bet'
and _date >='2010-09-10'
and accountnumber not in (select * from #wageredrecently)
group by accountnumber,firstname,lastname,email
The problem is, this takes ages to get the data. Is there a quicker way to acheive the same in sql?
Edit, just to be specific about the time: It takes just over 3 minutes, which is far too long for a query that is destined for a php intranet page.
Edit (11/09/2011): I've found out that the problem is the customers table. It's actually a view. It previously had good performance but now all of a sudden its performance is terrible, a simple query on it takes almost as long as the above query pair. I have therefore chosen an alternative table of customer data (that actually is a table, and not a view) and now the query pair takes about 15 seconds.
You should try to join customers after you have found and aggregated the rows from activity (I assume that handle is a column in activity).
select c.accountno,
c.firstname,
c.lastname,
c.email,
a.sumhandle
from customers as c
inner join (
select accountnumber,
sum(handle) as sumhandle
from activity
where _date >= '2010-09-10' and
transaction_type = 'bet' and
accountnumber not in (
select accountnumber
from activity
where _date >= '2011-08-10' and
transaction_type = 'bet'
)
group by accountnumber
) as a
on c.accountno = a.accountnumber
I also included your first query as a sub-query instead. I'm not sure what that will do for performance. It could be better, it could be worse, you have to test on your data.
I don't know your exact business need, but rarely will someone need access to innactive accounts over several months at a moments notice. Depending on when you pruge data, this may get worse.
You could create an indexed view that contains the last transaction date for each account:
max(_date) as RecentTransaction
If this table gets too large, it could be partioned by year or month of the activity.
Have you considered adding an index on _date to the activity table? It's probably taking so long because it has to do a full table scan on that column when you're comparing the dates. Also, is transaction_type indexed as well? Otherwise, the other index wouldn't do you any good.
Answering my question as the problem wasn't the structure of the query but one of the tables being used. It was a view and its performance was terrible. I change to an actual table with customer data in and reduced the execution time down to about 15 seconds.

What is the difference between an table index and a view index?

I am quite confused about the difference between an index on table and index on view (Indexed View). Please clarify it.
There really is none. The index on both table or view basically serves to speed up searches.
The main thing is: views normally do not have indices. When you add a clustered index to a view, you're basically "materializing" that view into a system-maintained, always automatically updated "pseudo-table" that exists on disk, uses disk space just like a table, and since it's really almost a table already, you can also add additional indices to an indexed view.
So really - between a table and an indexed view, there's little difference - and there's virtually no difference at all between indices on tables and an indexed view.
Indexes on views have some restrictions, because views can be based upon various combinations of tables and views.
In either case, they are similar, and as underlying data changes, indexes may or not need to be updated.
Indexes on table are generally always used - typically you will have at least one unique index (primary key) and may have identified one of the indexes to be clustered.
Indexes on views are generally only applied as an optimization technique as view reads become heavy, indexes on the view can improve performance using the views.
I've used indexed views to drastically improve the performance of queries where I want to group by a unique combination of fields and maybe calculate some aggregate SUM or count on them.
For example, consider a table that contains customer, truck, distance, date (plus about 30 other performance columns I don't want to query right now). I have hundreds of customers, they have hundreds of trucks each and each truck reports distance and other data 5 times a day. If I want to query a list of which trucks are reporting in which months, I create a view like this:
CREATE VIEW dbo.vw_DistinctUnitMonths
WITH SCHEMABINDING
AS
SELECT CustomerGroup,
CustomerId,
Vehicle,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE) AS Month, --Converts Date to First of the Month
SUM(CASE WHEN Miles > 0 THEN Miles ELSE 0 END) AS Miles,
COUNT_BIG(*) AS Count
FROM dbo.PerformanceData
GROUP BY CustomerGroup, CustomerId, Vehicle, CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_DistinctUnitMonths ON vw_DistinctUnitMonths (CustomerGroup, CustomerId, Vehicle, Month)
GO
Here's a slow query that doesn't use the view:
--Can Be Very Slow!
SELECT CustomerGroup,
CustomerId,
Vehicle,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE) AS Month
FROM PerformanceData
WHERE Month >= '2020-01-01'
AND Month < '2020-02-01'
GROUP BY Vehicle, ClientID, ClientGroupId, CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE)
And here is one that runs much faster, because of the indexed view.
--Much Faster
SELECT CustomerGroup,
CustomerId,
Vehicle,
Month
FROM vw_DistinctUnitMonths WITH (NOEXPAND)
WHERE Month >= '2020-01-01'
AND Month < '2020-04-01'
GROUP BY Vehicle, ClientID, ClientGroupId, Month
Because the indexed view is creating an index on only the unique combinations of customer, group, vehicle and month, the disk space for the view is much smaller than if I were to index those columns on the source table. Queries to the view are faster because the data in the view is concentrated to some tens of megabytes instead of the hundreds of gigabytes the source table occupies.
See also MSFT Docs: Create Indexed Views

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources