Keeping record from duplicating itself

Keeping record from duplicating itself - sql-server

I have an POS (point of sales) database where we store articles, barcodes (eannos) and orderlines (and much more, but only using these now)
I need to get a list of what articles, and what barcodes have been sold, and i'm nearly there. The only issue I can't seem to get right is articles who have more than one barcode on it (somtimes we add multiple barcodes to the article if the product have different colours, but we want to stick to only one articlenumer)
So my SQL query is:
select Description, eanno, eannoid, sum(count) from PurchaseOrderLines
join eannos on PurchaseOrderLines.SizeColorID=EanNos.SizeColorID
where PurchaseOrderLines.articleid in (select articleid from articles where articleno in ('60321129','60314516'))
group by Description, eanno, eannoid
The result is:
Description Eanno Eannoid Sold
Top l/s AOP Baby Dark Sapphire 74 7325850944711 141588 2.00
Top l/s AOP Baby Dark Sapphire 80 7325850944735 141589 2.00
Top l/s AOP Baby Dark Sapphire 86 7325850944759 141590 4.00
Top l/s AOP Baby Dark Sapphire 92 7325850944773 141591 4.00
Bow Tie Solid Preschool Ski Patrol One size 7325851134869 141819 30.00
Bow Tie Solid Preschool Ski Patrol One size 7325851176012 142937 30.00
The last line in the result is a duplicate, there has only been sold 30 of "Bow Tie Solid Preschool Ski Patrol One size" but I'm getting duplicate lines because it shows me each barcorde of that same article, and sums the count from orderlines.
How can I make sure only one record shows?

Related

Is normalization always necessary and more efficient?

I got into databases and normalization. I am still trying to understand normalization and I am confused about its usage. I'll try to explain it with this example.
Every day I collect data which would look like this in a single table:
TABLE: CAR_ALL
ID
DATE
CAR
LOCATION
FUEL
FUEL_USAGE
MILES
BATTERY
123
01.01.2021
Toyota
New York
40.3
3.6
79321
78
520
01.01.2021
BMW
Frankfurt
34.2
4.3
123232
30
934
01.01.2021
Mercedes
London
12.7
4.7
4321
89
123
05.01.2021
Toyota
New York
34.5
3.3
79515
77
520
05.01.2021
BMW
Frankfurt
20.1
4.6
123489
29
934
05.01.2021
Mercedes
London
43.7
5.0
4400
89
In this example I get data for thousands of cars every day. ID, CAR and LOCATION never changes. All the other data can have other values daily. If I understood correctly, normalizing would make it look like this:
TABLE: CAR_CONSTANT
ID
CAR
LOCATION
123
Toyota
New York
520
BMW
Frankfurt
934
Mercedes
London
TABLE: CAR_MEASUREMENT
GUID
ID
DATE
FUEL
FUEL_USAGE
MILES
BATTERY
1
123
01.01.2021
40.3
3.6
79321
78
2
520
01.01.2021
34.2
4.3
123232
30
3
934
01.01.2021
12.7
4.7
4321
89
4
123
05.01.2021
34.5
3.3
79515
77
5
520
05.01.2021
20.1
4.6
123489
29
6
934
05.01.2021
43.7
5.0
4400
89
I have two questions:
Does it make sense to create an extra table for DATE?
It is possible that new cars will be included through the collected data.
For every row I insert into CAR_MEASUREMENT, I would have to check whether the ID is already in CAR_CONSTANT. If it doesn't exist, I'd have to insert it.
But that means that I would have to check through CAR_CONSTANT thousands of times every day. Wouldn't it be more efficient if I just insert the whole data as 1 row into CAR_ALL? I wouldn't have to check through CAR_CONSTANT every time.

The benefits of normalization are dependent on your specific use case. I can see both pros and cons to normalizing your schema, but its impossible to say which is better without more knowledge of your use case.
Pros:
With your schema, normalization could reduce the amount of data consumed by your DB since CAR_MEASUREMENT will probably be much larger than CAR_CONSTANT. This scales up if you are able to factor out additional data into CAR_CONSTANT.
Normalization could also improve data consistency if you ever begin tracking additional fixed data about a car, such as license plate number. You could simply update one row in CAR_CONSTANT instead of potentially thousands of rows in CAR_ALL.
A normalized data structure can make it easier to query data for a specific car. using a LEFT JOIN, the DBMS can search through the CAR_MEASUREMENT table based on the integer ID column instead of having to compare two string columns.
Cons:
As you noted, the normalized form requires an additional lookup and possible insert to CAR_CONSTANT for every addition to CAR_MEASUREMENT. Depending on how fast you are collecting this data, those extra queries could be too much overhead.
To answer your questions directly:
I would not create an extra table for just the date. The date is a part of the CAR_MEASUREMENT data and should not be separated. The only exception that I can think of to this is if you will eventually collect measurements that do not contain any car data. In that case, then it would make sense to split CAR_MEASUREMENT into separate MEASUREMENT and CAR_DATA tables with MEASUREMENT containing the date, and CAR_DATA containing just the car-specific data.
See above. If you have a use case to query data for a specific car, then the normalized form can be more efficient. If not, then the additional INSERT overhead may not be worth it.

BI - fact table design with incompatible grains

I'm quite new to BI designing DB, and here some point I do not understand well.
I'm trying to import french census data, where I got population for each city. For each city, I have population with different age classification, that can't really relate with each other.
For instance, let's say that one classification is 00 to 20 years old, 21 to 59, and 60+
And the other is way more precise : 00 to 02, 03 to 05, etc. but the bounds are never the same as the first one classification : I don't have 15 to 20, but 18 to 22, for example.
So those 2 classifications are incompatible. How can I use them in my fact table ? Should I use 2 fact tables and 2 cubes ? Should I use one fact table, and 2 dimensions for 1 cube ? But in this case, I will have double counted facts when I'll sum to have total population for a city, won't I ?
This is national census data, and national classifications, so changing that or estimating population to mix those classifications is not an option. And to be clear, one row doesn't relate to one person, but to one city. My facts are not individuals but cities' populations.
So this table is like :
Line 1 : One city - one amount of population - one code for dim age (ex. 00 to 19 yo) of this population - code (m/f) for the dim gender of that population - date of the census
Line 2 : Same city - one amount of population - one code for dim age (ex. 20 to 34) of this population - code (m/f) for the dim gender - date of the census
And so it goes for a lot of cities, both gender, and multiple years.
Same
I hope this question is clear enough, as english is not my native language and as I'm quite new in DB and BI !
Thanks for helping me with that.

One possible solution using a single fact table and two dimensions for the age ranges:
1 - Categorical range based on the broadest census, for example:
Young 0-20
Adult 21-59
Senior 60+
You could then link the other census to this dimension with approximate values, for example 18-22 could be Young.
2 -Original age range. This dimension could be used for precise age ranges when you report on a single city, it can also help you evaluate the impact of the overlapping bounds (e.g. how many rows are in the young / 18-22 range?)

you can crate one dimention as below
young 1-20
adult 21-59
senior 60+
Classification is
young city 1 : 1-20
young city 2 : 4-23
id field1 field2 field3 field4 .......
1 1 year young_city_1 other .......
2 2 year young_city_1 other .......
3 3 year young_city_1 other .......
4 4 year young_city_1 young_city_2 .......
Now you can report from any item and with any division
i hope it is help you

Database design for voting

I am implementing a voting feature to allow users to vote for their favourite images. They are able to vote for only 3 images. Nothing more or less. Therefore, I am using checkboxes to do validation for it. I need to store these votes in my database.
Here is what i have so far :
|voteID | name| emailAddress| ICNo |imageID
(where imageID is a foreign key to the Images table)
I'm still learning about database systems and I feel like this isn't a good database design considering some of the fields like email address and IC Number have to be repeated.
For example,
|voteID | name| emailAddress | ICNo | imageID
1 BG email#example.com G822A28A 10
2 BG email#example.com G822A28A 11
3 BG email#example.com G822A28A 12
4 MO email2#example.com G111283Z 10

You have three "things" in your system - images, people, and votes.
An image can have multiple votes (from different people), and a person can have multiple votes (for different images).
One way to represent this in a diagram is as follows:
So you store information about a person in one place (the Person table), about Images in one place (the Images table), and Votes in one place. The "chicken feet" relationships between them show that one person can have many votes, and one image can have many votes. ("Many" meaning "more than one").

Correct Database Architecture

I don't know how to design my mysql webdatabase for a shop.
The scenario is for a site selling guided tours.
Each tour can be either a Private, a Semi-Private or a Group Tour. The price per person changes per tour type. BUT ALSO for the Private tours, the price per person varies depending on the number of persons. However it varies by different amounts depending on tour. How would i create a 'Tour/Product' record?
e.g. Let's say:
Tour of Vatican (tour has various bits of data - name, description, meeting point, duration, etc). Semi-Private tour costs 50 euro per person. Group tour costs 45 euro per person. Private tour costs (140 euro for 1-2 people), or 180 euro for 3 people, or 200 euros for 4 people, or 225 euros for 5 people or 240 euro for 6 people or for 7 people or more it costs 43 euro per person.
HOWEVER for the Tour of Coliseum (tour has same bits of data - name, description, meeting point, duration, etc), Semi Private costs 40 per person. Group costs 25 per person. Private tour costs (100 euro for 1-2 people), or 135 euro for 3 people, or 160 euros for 4 people, or 175 euros for 5 people or 180 euro for 6 people or for 7 people or more it costs 25 euro per person.
How would i structure the data in the database - 2 tables? 3 tables?
Totally confused....
Thanks
Tom

From what I understand from your post, the price alters depending on three different things:
The tour: the price for the Tour of Vatican is not similar with the price for the Tour of Colloseum.
The type of the tour: Private, a Semi-Private or a Group Tour.
The number of persons on the tour.
Since there is no exact (constant) price per person on any of the given options, I would go for a three tables approach.
The digaram is detailed in the below picture and works under the following assumptions:
There are three tables: Tour (containing the description for each individual tour); TourPriceOptions (containing the individual price options records) and TourType (which at all times, will contain just three records: Private, Semi-Private and Group Tour);
There are just two assumptions that you have to do:
A tour can have multiple price options (1 to many relationship)
An price option can have just one single tour type (1 to 1 relationship)
How to code this up:
Whenever the administrator of the store creates another tour in the backend of the store, he should be able to add multiple price options. In order to do this you will need to:
Create a new tour: a function which inserts in to the database a new entry in the tour table.
Get the id of the recently created tour: if there is only one person adding information at any given time, then there is a good bet to write a function that returns the id of the latest added tour.
Add pricing options based on the id_tour: insert a new price option based on the id_tour variable. Remember to assign a tour_type from one of the already predefined categories.
Whenever you want to return these values, just write a query that allows you to retrieve information based on the tour the user is currently browsing.
Additional things to research: Dynamic Forms - They will help you when you don't know how many price options an admin might want to add for a specific tour

How to merge rows of SQL data on column-based logic?

I'm writing a margin report on our General Ledger and I've got the basics working, but I need to merge the rows based on specific logic and I don't know how...
My data looks like this:
value1 value2 location date category debitamount creditamount
2029 390 ACT 2012-07-29 COSTS - Widgets and Gadgets 0.000 3.385
3029 390 ACT 2012-07-24 SALES - Widgets and Gadgets 1.170 0.000
And my report needs to display the two columns together like so:
plant date category debitamount creditamount
ACT 2012-07-29 Widgets and Gadgets 1.170 3.385
The logic to join them is contained in the value1 and value 2 column. Where the last 3 digits of value 1 and all three digits of value 2 are the same, the rows should be combined. Also, the 1st digit of value 1 will always been 2 for sales and 3 for costs (not sure if that matters)
IE 2029-390 is money coming in for Widgets and Gadgets sold to customers, while 3029-390 is money being spent to buy the Widgets and Gadgets from suppliers.
How can I so this programmatically in my stored procedure? (SQL Server 2008 R2)
Edit: Would I load the 3000's into one variable table the and the 2000's into another, then join the two on value2 and right(value1, 3)? Or something like that?

Try this:
SELECT RIGHT(LTRIM(RTRIM(value1)),3) , value2, MAX(location),
MAX(date), MAX(category), SUM(debitamount), SUM(creditamount) FROM
table1 GROUP BY RIGHT(LTRIM(RTRIM(value1)),3), value2
It will sum the credit amount and debit amount. It will choose the maximum string value in the other columns, assuming they are always the same when value2 and the last 3 digits of value1 are the same it shouldn't matter.