Is there any contradiction against too many values in a table (database)? - database

I was wondering if there's any contradiction or futur problems against a table in a database which contains about 80 columns. There will be only VARCHARs, few INT and maybe 1 or 2 MESSAGE. I did some research on the net but there's nothing really talking about that kind of problem...In other terms, is this okay or even 'normal' to put that much of values inside a table??
Thanks in advance!

You shouldn't have any real problems if the fields are mostly integers. Most DBMSes have a limit on row length, so a bunch of long columns can cause issues...but unless the varchar columns are very long, you're probably OK.
I've honestly never even needed to think about that, though -- with a properly normalized database, it's quite rare to ever need that many columns in a table.

More columns you have, more memory server needs to process the records.
I recomend to use the "multiple to one" relation scheme in this case.
Example of tables:
customer
id
name
email
...
ins_app_form (Insurance application form)
id
customer_id (relation with customer)
date
... (here comes some other data if you need)
ins_app_item (Insurance application form items/fields)
id
ins_app_form_id (relation with Insurance application form)
question (the name of a question in application form)
answer (customer's answer)
So to show the application form with this scheme you will need to run a query:
SELECT
iaf.id AS application_id,
iaf.date AS `date`,
iai.question,
iai.answer
FROM ins_app_form AS iaf
LEFT JOIN ins_app_item AS iai ON iai.ins_app_form_id=iaf.id
WHERE iaf.customer_id=<ID of a customer>
This query will bring you something like this:
id date question answer
1 2014-03-31 "Year" "2008"
1 2014-03-31 "Car make" "Audi"
1 2014-03-31 "Car model" "Q7"
...

Related

Power BI combine results from two SQL-Server tables

While using Power BI for a few months now, we (the user group) encountered an issue that is not really clear to us.
We use Power-BI with a remote SQL-Server data source, we access the data source through direct query.
Let's pretend we have 2 Tables as below-
Table name: Issue
Column:
ResolutionTime(Date/Time)
IssueID(Unique Numbers)
Table Name: WorkItem
Column:
start (Date/Time)
end (Date/Time)
IssueID (Unique Numbers, Foreign Key to "Issue" table)
Table WorkItem also contain a calculated column "WorkTime" which uses this DAX-expression as below-
WorkTime = WorkItem[end] - WorkItem[start]
The two tables are configured through Power-Bi having a two-way 1:n relationship that can be queried to collect all "WorkItem"(s) assigned to an "Issue" entry, using the "IssueID" as correlation column.
To be able to compute the aggregated "work-time" for each "WorkItem", we use a new/calculated table with the following DAX expression to aggregate the total amount of time invested for a single "Issue":
SumWork =
SUMMARIZE(
WorkItem, WorkItem[IssueID], "All work per item", SUM(WorkItem[WorkTime])
)
The above table computes the total invested work-time for a particular issue, grouping/summarizing results based on the "IssueID" foreign key. This new calculated table is also configured to have a relationship with the "Issue" table, this time a "1:1" relationship, using the IssueID as correlation column.
Now to compute the time that the issue was worked on + the time for Resolution should be summarized in a calculated column inside "Issue", but this does not work:
ResolutionAndWorkTime = Issue[ResolutionTime] + SumWork["All work per item"]
But the above DAX expression fails to compile, as it always reports that it returns "more than one result", thus not being a singular result. But that is suprising, as the two table ("Issue" and "SumWork" are related to each other with a "1:1" relationship).
Tables:
Issues
IssueID ResolutionTime ResolutionAndWorkTime
1 03:20:20 ???
2 01:20:20 ???
3 00:20:20 ???
WorkItem
IssueID start end WorkTime
1 1-2-2020 3:20:20 1-2-2020 3:25:20 00:05:00
1 2-2-2020 6:20:20 2-2-2020 7:20:20 01:00:00
3 1-3-2020 3:20:20 1-3-2020 3:29:20 00:09:00
Any ideas what to look for? Data-types? Table-definition? Table-relationships? We checked other Stackoverflow questions/answers, but no good ideas retrieved so far.
NOTE that a lot of join/merge features of Power BI are not available if direct-query is used and thus joining the tables is not really an option (we think).
You need this following code for your new Calculated column.
Visit HERE To know more about RELATED.
ResolutionAndWorkTime = Issues[ResolutionTime] + RELATED(SumWork[All work per item])
Based on input provided by "mkRabbani" (see other answer) we investigated why "RELATED" does not function as expected. The problem originates in the access to the database. As suspected earlier the function delivers the expected results once the database access is switched to "import" instead of "direct-query".
As a workaround we now joins the data inside the SQL server by using traditional database views. Of course this only works for scenarios where the database is under control of the data analytics team.

Database Structure and Querying

Trying to figure out how to change a structure from what I currently have which is this:
tblHaulLogs
intLogID
intHaulType
intSerial
intOriginSource
intOrigin
intDestinationSource
intDestination
dtmHaulDate
ccyLogPay
intHauler
txtLogNotes
intInvoiceID
In this table, what I am doing is using the origin and destination source fields to determine which table the fk for the origin and destination comes from. This feels very wrong to me.
tblHaulTypes
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
Data in the Haul Types Table:
LOT, 1, 1
DEL, 1, 2
RPO, 2, 1
Now let me explain:
The first type happens when an item goes from a sales lot to another sales lot.
The second type happens when an item goes from a sales lot to a customer(sale gets delivered).
The third type happens when an item returns from the customer back to the sales lot.
Then the Item can be resold/returned/resold/returned(rent-to-own system).
Now, here are the problems I have:
An Haul Log's origin will always be the destination of the last move. Therefore I thought that the origin field is redundant. However, it's the relation between the destination of the last move and the destination of the new move that defines what the shipper gets paid and what type of haul it is.
In other words, even though the first type and the third type technically have the same fields, the type of move is not the same because of the previous move type. What do I need to do here? Am I totally missing the boat on what the structure should be?
The questions I need to answer based on this data is:
How many Items do I have on my sales lots that are new inventory(have never been sold).
How many Items do I have that have been sold and returned(doesn't matter how many times).
I'm guessing at the relationship between the various fields and tables.
Your tblHaulTypes table looks fine.
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
You're missing a haul type that accounts for deliveries from suppliers to your lots.
There has to be some table that lists your lots. I'd call it tblHaulLot.
intLotNumber
txtLotName
...
I'd make a tblHaulTransaction table that looks like this.
intTransactionID
intHaulTypeID
intHauler
intOriginOrganizationID
intDestinationOrganizationID
intOriginLot (null if origin is supplier)
intDestinationLot (null if destination is customer)
dtmHaulDate
txtLogNotes
Now, we need an tblOrganization.
intOrganizationID
txtOrganizationName
txtOrganizationAddress
...
The organization at ID 0 is your organization. Suppliers and customers would fill the rest of the table.
I'd make a tblHaulInvoice table that looks like this.
intInvoiceID
intTransactionID
ccyTransactionPay
dtmDateInvoiced
AmountInvoiced
The amount invoiced (and amount paid) have to be accounted for in some table. I don't know what ccy stands for, and I don't know your 3 letter code for a decimal (money) field.
How many Items do I have on my sales lots that are new inventory(have never been sold). How many Items do I have that have been sold and returned(doesn't matter how many times).
Nowhere in your data model is there any kind of inventory table. I'd need to know a lot more about your business to create one or more inventory tables.

How to Select a record in Access if it is not a duplicate value

Ok, from the title it seems to be impossible to understand, I'll try to be as clear as possible.
Basically, I have a table, let's call it 'records'. In this table I have some products, of which I store 'id', 'codex' (which is a unique identifier for a certain product in the whole database), 'price' and 'situation'. This last one is a string which tells me wether the product has just entered the store (in that case it is set to 'IN'), or it has already been sold ('OUT' in this case).
The database was not created by us, I HAVE to work with that although it is horribly structured... The guy who originally projected the database decided to register when a product's situation passes from 'IN' to 'OUT' in the following way: instead of UPDATEing the corresponding value in the table, he used to take the row of data with 'IN' as situation, and to DUPLICATE it setting, that time, 'OUT' as situation.
Just to sum up: if a product has not been sold yet, it will have one row of dedicated data; otherwise those rows will be two, identical except for the 'situation' field.
What I need to do is: select a product if (and ONLY if) there is no duplicate for it. Basically, I can (and should) look for a 'codex', and if I my Count(codex) ends up being >1, I do not select the row.
I hope the explanation of the process is clear enough...
I tryed many alternative (no, SELECT DISTINCT is not a solution): des anyone have an idea of how to do that? Because really, none of us three could come up with a good solution!
Here is the schema for the table, I hope it is sufficiently clear, and if not do not hesitate asking for more details.
Just as a reminder: the project is in (sigh...) VB.net, the database is in Microsoft Access (mdb).
I could not find a solution on StackOverFlow, I hope this is not a duplicate question! Thanks in advance for the help.
id codex price situation
1 1 2.50 IN
2 1 2.50 OUT
3 2 3.45 IN
4 3 21.50 IN
5 2 3.45 OUT
6 4 1.50 IN
To check if I understand what your problem is... In your example table you just want to get the lines with ID 4 a 6, right?
If is that what you want, and If you want only the not sold ones try this command
SELECT
*
FROM
records
WHERE
codex
not in
(
SELECT
codex
FROM
records
WHERE
situation ='OUT'
)

Mini/Junk Dimension Advice

I have a Client Dimension and a Fact table which tracks Sessions with Clients, these have the following columns:
Code:
[DimClient]
----------
PK_ClientKey
ClientNumber
EmailAddress
Postcode
PostcodeLongitude
PostcodeLatitude
DateOfBirth
Gender *
Sexuality *
CulturalIdentity *
LanguageSpokenAtHome *
CountryOfBirth
UsualAccommodation *
LivingWith *
OccupationStatus *
HighestLevelOfSchooling *
RegistrationDate
LastLoginDate
Status
[FactSession]
-------------
PK_SessionKey
FK_ClientKey
...
My first requirement was to start grouping the age of the Clients at a specific Session (FactSession), the best way to approach this was to create a Age Group dimension and create a foreign key (FK_AgeGroupKey) in the FactSession to the DimAgeGroup dimension.
Now I'm thinking it would be good to track all the columns with an * (above). These could (not yet proven) have a high correlation against Sessions. Reading through the DWH Toolkit it seems a Mini Dimension to accomodate all the * columns along with the Age Group would suit best, so I put together the following structure:
Code:
[DimClient]
----------
PK_ClientKey
ClientNumber
...
Status
[DimDemographic]
-----------------
PK_DemographicKey
AgeGroup
Gender
Sexuality
...
HighestLevelOfSchooling
[FactSession]
-------------
PK_SessionKey
FK_ClientKey
FK_DemographicKey
The DimDemographic table would need to utilize a SCD Type 2 to be able to track the changes over time. Would this be the best approach to my requirements?
Additionally, I have RegistrationDate and LastLoginDate columns on my Client Dimension, in the case where a Client registers but never logs in what would be the best value to put in the LastLoginDate field? Something like '1900-01-01' or NULL?
Sorry for the long post but hopefully I have given enough information Thanks in advance!
I would add a field to your client dimensions to indicate the user has never logged in. Something like:
select * form DimClient where HasUserLoggedIn = 'NO';
Its very human readable and you won't have to teach your business users about nulls. Traditionally nulls are bad in a Data Warehouse except in the case of numeric fact values, due to the complexities of null != null.
Yes, the above solution should work fine. It supports your need to track changes over time, otherwise you can have included the DimDemographic linkage directly in DimClient.
Regarding the date question, I believe you should use NULL, it means that there is no value because there was no login. Also, identifying non-logged-in would be:
select * from DimClient where LastLoginDate IS NULL
For me this reads much better than a query that uses an artificial date.

DB Design: Sort Order for Lookup Tables

I have an application where the database back-end has around 15 lookup tables. For instance there is a table for Counties like this:
CountyID(PK) County
49001 Beaver
49005 Cache
49007 Carbon
49009 Daggett
49011 Davis
49015 Emery
49029 Morgan
49031 Piute
49033 Rich
49035 Salt Lake
49037 San Juan
49041 Sevier
49043 Summit
49045 Tooele
49049 Utah
49051 Wasatch
49057 Weber
The UI for this app has a number of combo boxes in various places for these lookup tables, and my client has asked that the boxes list in this case:
CountyID(PK) County
49035 Salt Lake
49049 Utah
49011 Davis
49057 Weber
49045 Tooele
'The Rest Alphabetically
The best plan I have for accomplishing this is to add a column to each lookup table for SortOrder(numeric). I had a colleague tell me he thought that would cause the tables to violate 3rd-Normal-Form, but I think the sort order still depends on the key and only the key (even though the rest of the list is alphabetical).
Is adding the SortOrder column the best way to do this, or is there a better way I am just not seeing?
I agree with #cletus that a sort order column is a good way to go and it does not violate 3NF (because, as you said, the sort order column entries are functionally dependent on the candidate keys of the table).
I'm not sure I agree that alphanumeric is better than numeric. In the specific case of counties, there are seldom new ones created. But there is no requirement that the numbers assigned are sequential; you can allocate them with numbers that are a multiple of a hundred, for example, leaving ample room for insertions.
Yes I agree a sort order column is the best solution when the requirements call for a custom sort order like the one you cite. I wouldn't go with a numeric column however. If the data is alphanumeric, the sort order should be alphanumeric. That way you can seed the value with whatever is in the county field.
If you use a numeric field you'll have to resequence the entire table (potentially) whenever you add a new entry. So:
Columns: ID, County, SortOrder
Seed:
UPADTE County SET SortOrder = CONCAT('M-', County)
and for the special cases:
UPDATE County
SET SortOrder = CONCAT('E-' . County)
WHERE County IN ('Salt Lake', 'Utah', 'Davis', 'Weber', 'Tooele')
Arguably you may want to put another marker column in to indicate those entries are special.
I went with numeric and large multiples.
Even with the CONCAT('E-'.. example, I don't get the required sort order. That would give me Davis, SL, Tooele... and Salt Lake needs to be first.
I ended up using multiples of 10 and assigned the non-special-sort entries a value like 10000. That way the view for each lookup can have
ORDER BY SortOrder ASC, OtherField ASC
Another programmer suggested using DECODE in Oracle, or CASE statements in SQL Server, but this is a more general solution. YMMV.

Resources