Merge two tables with common values by adding the count, append rest - sql-server

I have two tables, created at runtime, which pulls some data from some entirely different tables and joins and complex where clauses, and finally, I have these two tables with columns something like this:
TableCars:
Id Company Views
```````````````````````
01 Honda 12
32 audi 6
18 BMW 3
17 Vector 5
TableBikes:
Id Company Views
```````````````````````
01 Honda 3
32 audi 1
19 Kawasaki 2
Note:
company names and id's will always be the same in these two tables, like, honda = 01
I want to merge these two tables in the sense that I dont want any repetitions in the names (or id's) and I want the views to be added. Is there any way to do this without using a while loop and a whole lotta hair loss?
the resultant table should be something like this:
ResultantTable
Id Company Views
``````````````````````
01 Honda 15
32 audi 7
18 BMW 3
17 Vector 5
19 Kawasaki 2
Many thanks in advance.
ps: I tried to check google, came across "merge" clause, looked up on that, MSDN was way over my head. I never understand ANYTHING in MSDN. wonder if there are any other people like me.

You could try this:
SELECT company, SUM(views) FROM
(SELECT company, views FROM first
UNION ALL
SELECT company, views FROM second) as t
GROUP BY company
Let me know if you have any issues

select id, company, sum(views) views
from
(
select id, company, views
from tablecars
union all
select id, company, views
from tablebikes
) joined
group by id, company
You can always turn SELECT statements into a CREATE table statement:
select id, company, sum(views) views
into NewTableName
from
(
select id, company, views
from tablecars
union all
select id, company, views
from tablebikes
) joined
group by id, company
I don't normally recommended this method to create permanent tables. It is better to create the table and manually define keys, constraints, defaults, indexes. If you create it this way, you can still add required keys later using ALTER table statements.
If you will always have the two base tables around, and you need this "joined" data for querying/reporting, and it has to keep in sync with the base tables - then what you are really after is a VIEW using the first SELECT statement.

Related

How to represent data with meaning dependent on other columns?

This question is about relational databases like postgresql or oracle. The following is a toy example of my problem. Say I have five tables
Client (id, name)
1 Joe
2 Ted
Factory (id, name, salary)
1 BMW 20
2 Porsche 30
Farm (id, name, salary)
1 Wineyard 10
2 Cattle farm 5
Occupation
1 Farmer
2 Worker
Client_Occupation (client-id, occupation-id, dependent-id)
1 (Joe) 1 (Farmer) 1 (Wineyard)
2 (Ted) 2 (Worker) 2 (Porsche)
To find out how much Joe earns, the sql needs to use data from either the table farms (if the occupation id is 1) or factories (if the occupation id is 2). This is creating very convoluted queries and makes the processing code more complex than it should be.
Is there a better way to structure this data if I do not want to merge the factories and farms tables?
In other words: the table client_ocupation has a conditional relation to either Farm or Factory. Is there a better way to represent this information?
Client
^
|
Factory <- Client_Occupation -> Farm
|
v
Occupation
This comes up often when modeling OO hierarchies, and is a table inheritance pattern. Generally, I prefer Concrete Table Inheritance over Single Table Inheritance, but they both have their use cases.
Single Table Inheritance is really easy - just take your Farm and Factory tables and merge them together for a superset. Done and done. Unfortunately, those niceties things like constraints become difficult, and you'll find yourself writing a lot of CASE expressions on the discriminator column.
Concrete Table Inheritance takes a little time to grok, but is actually pretty simple as well. You design a base table w/common attributes and a discriminator column, and then "sub" tables w/specific attributes for each level of the hierarchy. The sub tables are linked back to the base table by Id and Discriminator - which provides some query optimizer gains and prevents a single entity from being multiple types.
For your example problem, you really don't have any specific attributes on the leaves - so this will look a little odd, but the pattern still works (note that I'm renaming "dependent-id" to "location-id" to make it more understandable):
Occupation (id, name)
1 Farmer
2 Worker
# the base table for all locations w/common attributes
Location (id, type, name, salary)
# unique constraint on discriminator column, and another on the combination of id and type so that we can reference in a foreign key
PK(id), UC(type), UC(id, type)
1 Farm Wineyard 10
2 Farm Cattle farm 5
3 Factory BMW 20
4 Factory Porsche 30
Client_Occupation (client-id, occupation-id, location-id)
FK location-id => location.id
1 (Joe) 1 (Farmer) 3 (Wineyard)
2 (Ted) 2 (Worker) 1 (Porsche)
Farm
# carry the discriminator column and check constraint it; any Farm specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Farm'
Factory (id, type)
1 Farm
2 Farm
Factory
# carry the discriminator column and check constraint it; any Factory specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Factory'
Factory (id, type)
3 Factory
4 Factory
You can now easily get salary for everything applicable:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
And, if you want to get specific Farm columns, or limit it to Farm locations:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
JOIN Farm F ON CO.id = F.id and CO.type = F.type // technically, joining on type is unnecessary, but I prefer to include it anyway
And, if you want to emulate the Single Table inheritance model:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
LEFT OUTER JOIN Farm F ON CO.id = F.id and CO.type = F.type
LEFT OUTER JOIN Factory T ON CO.id = T.id and CO.type = T.type
On occasion, it's difficult to refactor an existing table design into this pattern. In that case, a view for your base table provides a lot of the same querying benefits:
CREATE VIEW Location AS
SELECT id, 'Farm' as type, name, salary FROM Farm
UNION ALL
SELECT id, 'Factory' as type, name, salary FROM Factory
At that point, you will (probably) have conflicting ids (eg., a FarmId = 1 and FactoryId = 1) so you'll need to be diligent about including the type column in any joins.
Ideally, Farm and Factory should be merged to Workplace and then Workplace could have additional attribute called Type which could take the value Farm or Factory. So the table would look somewhat like this:
Workplace
---------
Id Name Salary Type
-- ---- ------ ----
1 BMW 20 Factory
2 Cattle Farm 5 Farm
.
.
Your problem's root is bad design, not a lack of skill to write good enough queries.
Now that it is fixed in the question that you can't merge Farm and Factory, answering your question:
Is there a better way to structure this data if I do not want to merge the factories and farms tables?
No matter however you structure the data, as long as the tables are different, you will have to write the logic to figure out which table to query. So that convolution is inevitable if you keep Farm and Factory separated and follow proper design otherwise. But you can hide this convolution behind the scenes. Create views from the query on tables you have to often join. Now rather than performing query between three tables, you write the query between one table and the view generated from the query on other two.

SQL JOIN all tables from one data

I am trying to get all the data from all tables in one DB.
I have looked around, but i haven't been able to find any solution that works with my current problems.
I made a C# program that creates a table for each day the program runs. The table name will be like this tbl18_12_2015 for today's date (Danish date format).
Now in order to make a yearly report i would love if i can get ALL the data from all the tables in the DB that stores these reports. I have no way of knowing how many tables there will be or what they are called, other than the format (tblDD-MM-YYYY).
in thinking something like this(that obviously doesen't work)
SELECT * FROM DB_NAME.*
All the tables have the same columns, and one of them is a primary key, that auto increments.
Here is a table named tbl17_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 92545 TOM 20,5 A NULL NULL
4 92545 TOM 20,5 A NULL NULL
6 117681 LISA NULL NULL 207 R
Here is a table named tbl18_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 117681 LISA 30 A NULL NULL
4 53694 DAVID 78 A NULL NULL
6 58461 MICHELLE NULL NULL 207 R
What i would like to get is something like this(from all tables in the DB):
PERSONID NAME PAYMENT TYPE RESULT TYPE
92545 TOM 20,5 A NULL NULL
92545 TOM 20,5 A NULL NULL
117681 LISA NULL NULL 207 R
117681 LISA 30 A NULL NULL
53694 DAVID 78 A NULL NULL
58461 MICHELLE NULL NULL 207 R
Have tried some different query's but none of them returned this, just a lot of info about the tables.
Thanks in advance, and happy holidays
edit: corrected tbl18_12_2015 col 3 header to english rather than danish
Thanks to all those who tried to help me solving this question, but i can't (due to my skill set most likely) get the UNION to work, so that's why i decided to refactor my DB.
While you could store the table names in a database and use dynamic sql to union them together, this is NOT a good idea and you shouldn't even consider it - STOP NOW!!!!!
What you need to do is create a new table with the same fields - and add an ID (auto-incrementing identity column) and a DateTime field. Then, instead of creating a new table for each day, just write your data to this table with the DateTime. Then, you can use the DateTime field to filter your results, whether you want something from a day, week, month, year, decade, etc. - and you don't need dynamic sql - and you don't have 10,000 database tables.
I know some people posted comments expressing the same sentiments, but, really, this should be an answer.
If you had all the tables in the same database you would be able to use the UNION Operator to combine all your tables..
Maybe you can do something like this to select all the tables names from a given database
For SQL Server:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='dbName'
For MySQL:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA='dbName'
Once you have the list of tables you can move all the tables to 1 database and create your report using Unions..
You will need to use a UNION between each select query.
Do not use *, always list the name of the columns you are bringing up.
If you want duplicates, then UNION ALL is what you want.
If you want unique records based on the PERSONID, but there is likely to be differences, then I will guess that an UPDATE_DATE column will be useful to determine which one to use but what if each records with the same PERSONID lived a life of its own on each side?
You'd need to determine business rules to find out which specific changes to keep and merge into the unique resulting record and you'd be on your own.
What is "Skyttenavn"? Is it Danish? If it is the same as "NAME", you'd want to alias that column as 'NAME' in the select query, although it's the order of the columns as listed that counts when determining what to unite.
You'd need a new auto-incremented ID as a unique primary key, by the way, if you are likely to have conflicting IDs. If you want to merge them together into a new primary key identity column, you'd want to set IDENTITY_INSERT to OFF then back to ON if you want to restart natural incrementation.

Database schema for end user report designer

I'm trying to implement a feature whereby, apart from all the reports that I have in my system, I will allow the end user to create simple reports. (not overly complex reports that involves slicing and dicing across multiple tables with lots of logic)
The user will be able to:
1) Select a base table from a list of allowable tables (e.g., Customers)
2) Select multiple sub tables (e.g., Address table, with AddressId as the field to link Customers to Address)
3) Select the fields from the tables
4) Have basic sorting
Here's the database schema I have current, and I'm quite certain it's far from perfect, so I'm wondering what else I can improve on
AllowableTables table
This table will contain the list of tables that the user can create their custom reports against.
Id Table
----------------------------------
1 Customers
2 Address
3 Orders
4 Products
ReportTemplates table
Id Name MainTable
------------------------------------------------------------------
1 Customer Report #2 Customers
2 Customer Report #3 Customers
ReportTemplateSettings table
Id TemplateId TableName FieldName ColumnHeader ColumnWidth Sequence
-------------------------------------------------------------------------------
1 1 Customer Id Customer S/N 100 1
2 1 Customer Name Full Name 100 2
3 1 Address Address1 Address 1 100 3
I know this isn't complete, but this is what I've come up with so far. Does anyone have any links to a reference design, or have any inputs as to how I can improve this?
This needs a lot of work even though it’s relatively simple task. Here are several other columns you might want to include as well as some other details to take care of.
Store table name along with schema name or store schema name in additional column, add column for sorting and sort order
Create additional table to store child tables that will be used in the report (report id, schema name, table name, column in child table used to join tables, column in parent table used to join tables, join operator (may not be needed if it always =)
Create additional table that will store column names (report id, schema name, table name, column name, display as)
There are probably several more things that will come up after you complete this but hopefully this will get you in the right direction.

Viable ways to have an AutoNumber per user (or per entity)?

Let's say you have a web application that manages books for book sellers, and it is built on a multi-tenant database with a single books table that contains books from several book sellers.
Now let's say that each book seller really wants each of their books to have a unique number associated with it so they can look books up by that number, but it's important to them that the number is roughly consecutive for them. (It's OK if there are small breaks in the sequence due to deleted books and other events that cause an AutoNumber to get consumed but not used).
Obviously each book already has a unique number (primary key) associated with it that is generated via AutoNumber and is unique across book sellers. That is not what I am discussing here.
Let's just assume SQL-Server from here on, but the discussion applies equally to Oracle (except that Oracle uses Sequences that are independent of tables, and the current version of SQL Server must use a table to accomplish the same thing).
We want a number that increments safely in the context of a book seller. We want to maintain the benefits of using AutoNumber, but we want there to be one sequence per book seller. It seems like there are two options, and neither are very good:
Create one single-column table per book seller. This scares me because I can't think of another example of dynamically changing the schema (adding a new table whenever a new book seller is added to the system via the web application) in a web request. It also seems really heavyweight to have one table per book seller. I know a future version of SQL-server will support Sequences, but even that would still be a schema change at run time.
Roll your own auto-numbering behavior. This seems really risky because databases' built-in AutoNumber features take care of a lot of stuff for you, and giving that up is a big deal. Attempts to re-implement it yourself are probably error-prone and may cause poorer concurrency than the built-in AutoNumber.
Hopefully there are additional options that I'm missing. Has anyone successfully dealt with a similar situation? Thanks.
Is there a reason you couldn't have a 2 field table with:
BookSeller_ID, BookID
You wouldn't need to change schema as you add sellers, and it would be trivial to track per seller:
SELECT MAX(BookID)
WHERE BookSeller_ID = 123
For additional info you could also add a Universal_BookID field that linked to your unique ID referenced in the 3rd paragraph.
EDIT:
To clarify, if you have sellers 1 2 and 3 you could have a table like:
SellerID BookID BookUniversalID
1 1 123
2 1 456
3 1 999
1 2 1234
1 3 8798
1 4 999
1 5 10000001
3 2 123
3 3 456
You keep track of which seller has which IDs assigned and which actual book it links too, and to determine what the next ID is for a seller just query
SELECT MAX(bookid) FROM ThisTable WHERE SellerID = 1
DENSE_RANK, works in SQL Server and Oracle
Assuming your table looks vaguely thus
CREATE TABLE dbo.BOOKS
(
internal_book_id int identity(1,1) primary key
, seller_id int NOT NULL
, title varchar(50) NOT NULL
)
Whenever you present the identity value to the seller, use the dense_rank() function to generate the surrogate values.
CREATE VIEW dbo.BOOK_TO_SELLER_MAP
AS
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
dbo.BOOKS B
WHERE
B.seller_id = #sellerId
For the combination of seller_id and the generated id, you ought to always match back to the true id (assuming no physical deletes).
Demo code
;
WITH BOOKS (internal_book_id, seller_id, title)
AS
(
SELECT 1, 100, 'Secret of NIMH'
UNION ALL SELECT 2, 400, 'Once and Future King'
UNION ALL SELECT 7, 88, 'Microsoft SQL Server 2008'
UNION ALL SELECT 8, 100, 'Bonfire of the Vanities'
UNION ALL SELECT 9, 100, 'Canary Row'
UNION ALL SELECT 10, 400, '1916'
UNION ALL SELECT 11, 100, 'The Picture of Dorian Gray'
UNION ALL SELECT 12, 88, 'The Disasters of War'
)
, BOOK_TO_SELLER_MAP AS
(
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
BOOKS B
)
SELECT
*
FROM
BOOK_TO_SELLER_MAP V
ORDER BY
V.seller_id
, V.unique_book_id_for_seller
Results
internal_book_id seller_id title unique_book_id_for_seller
7 88 Microsoft SQL Server 2008 1
12 88 The Disasters of War 2
-------------------------------------------------------------------------------
1 100 Secret of NIMH 1
8 100 Bonfire of the Vanities 2
9 100 Canary Row 3
11 100 The Picture of Dorian Gray 4
-------------------------------------------------------------------------------
2 400 Once and Future King 1
10 400 1916 2
OMG Ponies is correct that sequences are the only correct way to achieve this. There isn't really another viable option.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources