Merge two tables into a single indexed view in SQL Server 2008 - database

So here is my dilemma, I'm currently storing all my records into a giant flat table that has movies, episodes, series, games etc. 70% of the records are episodes that I do NOT want indexed by the default full-text catalog as they make up over 2 million of the records and 90% of the time people are searching for movies/series, so I want to move these records into a separate table so they can have their own separate full-text catalog and a few additional columns that do not apply to movies/series (season, episode # etc.) as well.
My issue is with stored procedure I use to checkTable if an ID already exists before I go and download data or update data for that ID.
I wanted to create a view of all the ID's across both tables so I could lookup in a single place if that ID exists, however when I try to union the tables it will not allow me to create an index on the view.
ID Type | ID Type
1 movie | 2 episode
3 movie | 4 episode
5 movie | 6 episode
The ID's are unique and will never have duplicates in either table.
I feel that since this view will have over 2 million records an indexed might be important being it is called over 250-500+ a second so optimization is a huge factor.
Is there a better approach than using UNION ALL to get ID's 1,2,3,4,5,6 into a single view?

Probably your best solution is to create a view that does UNION ALL of the two tables, but instead of trying to index the view, make sure you have a good indexing strategy implemented on the tables themselves.
You can use Index Hints in your view to force the view to use those indexes whenever someone selects from the view.

Related

SQL Server to allow multiple values in a column

I'm using SQL Server as backend and MS Access as a front end, I can create a table in Access with lookup wizard which will allow me to create combo box in the form and select multiple values in the drop down, for example,
Lookup value definition, Drama, adventure, horror etc., in combo box I can select multiple genres for a movie and store them in a single cell on the table.
How to replicate the same with SQL Server as back end, I tried entering multiple value in default value and it won't work. Is this possible with SQL Server?
Entering many values into a single column is a big no no in relational database design.
What you are trying to do here is to realize a many to many relationship. You have multiple items, films or whatever and you have multiple genres.
That is the classical case of a many to many relationship. Usually in such a case you have 3 tables, one representing the items, another representing the genres. In between you've got a third table that has foreign keys to the items and the genres.
In the end you will end up with a table structure like this:
Items ItemToGenre Genres
ID | item ItemID | GenreID ID | genre
And have foreign keys on ItemToGenre.ItemID from Items.ID and on ItemToGenre.GenreID to Genres.ID.
Then you can JOIN your tables to get all genres any specific item has. And do not break atomization.

How to optimize SQL Query when you have more than 100 where Clause

Currently I have a view called vStoreProduct which has following columns
DOEntry | StoreID | UpcCode | Value
Now my users filter StoreID and UpcCode usually I have around more than 500 Stores and more than 700 UpcCode.
In my frontend or user interface, user can select anything ie
All Store & All Product
Some Store or some product
Now the outcome SQL Query is something like this
select count(*) from vStoreProduct where StoreID in ( ..................) and
UpcCode in (.....................)
Currently even a count is taking more than 3 mins for a view of 500,000 records.
Is this the best approach or would you recommend something else.
Thanks
What is the code in vStoreProduct? Is it just a simple select for one (or more) base tables, or does it contain more logic that is the root cause for the slowness? If it's complex, can you use an indexed view instead?
If the view is simple, are the fields you use indexed? Have you looked at statistics io output or checked heavy operations from execution plan (table / index scans, sorts and key lookups for large number of rows, spools).
If the view contains 500 000 rows, how many of these are you fetching when it takes 3 minutes? How many rows are in the base tables?

One large table or many small ones in database?

Say I want to create a typical todo-webApp using a db like postgresql. A user should be able to create todo-lists. On this lists he should be able to make the actual todo-entries.
I regard the todo-list as an object which has different properties like owner, name, etc, and of course the actual todo-entries which have their own properties like content, priority, date ... .
My idea was to create a table for all the todo-lists of all the users. In this table I would store all the attributes of each list. But the questions which arises is how to store the todo-entries themselves? Of course in an additional table, but should I rather:
1. Create one big table for all the entries and have a field storing the id of the todo-list they belong to, like so:
todo-list: id, owner, ...
todo-entries: list.id, content, ...
which would give 2 tables in total. The todo-entries table could get very large. Although we know that entries expire, hence the table only grows with more usage but not over time. Then we would write something like SELECT * FROM todo-entries WHERE todo-list-id=id where id is the of the list we are trying to retrieve.
OR
2. Create a todo-entries table on a per user basis.
todo-list: id, owner, ...
todo-entries-owner: list.id, content,. ..
Number of entries table depends on number of users in the system. Something like SELECT * FROM todo-entries-owner. Mid-sized tables depending on the number of entries users do in total.
OR
3. Create one todo-entries-table for each todo-list and then store a generated table name in a field for the table. For instance could we use the todos-list unique id in the table name like:
todo-list: id, owner, entries-list-name, ...
todo-entries-id: content, ... //the id part is the id from the todo-list id field.
In the third case we could potentially have quite a large number of tables. A user might create many 'short' todo-lists. To retrieve the list we would then simply go along the lines SELECT * FROM todo-entries-id where todo-entries-id should be either a field in the todo-list or it could be done implicitly by concatenating 'todo-entries' with the todos-list unique id. Btw.: How do I do that, should this be done in js or can it be done in PostgreSQL directly? And very related to this: in the SELECT * FROM <tablename> statement, is it possible to have the value of some field of some other table as <tablename>? Like SELECT * FROM todo-list(id).entries-list-name or so.
The three possibilities go from few large to many small tables. My personal feeling is that the second or third solutions are better. I think they might scale better. But I'm not sure quite sure of that and I would like to know what the 'typical' approach is.
I could go more in depth of what I think of each of the approaches, but to get to the point of my question:
Which of the three possibilities should I go for? (or anything else, has this to do with normalization?)
Follow up:
What would the (PostgreSQL) statements then look like?
The only viable option is the first. It is far easier to manage and will very likely be faster than the other options.
Image you have 1 million users, with an average of 3 to-do lists each, with an average of 5 entries per list.
Scenario 1
In the first scenario you have three tables:
todo_users: 1 million records
todo_lists: 3 million records
todo_entries: 15 million records
Such table sizes are no problem for PostgreSQL and with the right indexes you will be able to retrieve any data in less than a second (meaning just simple queries; if your queries become more complex (like: get me the todo_entries for the longest todo_list of the top 15% of todo_users that have made less than 3 todo_lists in the 3-month period with the highest todo_entries entered) it will obviously be slower (as in the other scenarios). The queries are very straightforward:
-- Find user data based on username entered in the web site
-- An index on 'username' is essential here
SELECT * FROM todo_users WHERE username = ?;
-- Find to-do lists from a user whose userid has been retrieved with previous query
SELECT * FROM todo_lists WHERE userid = ?;
-- Find entries for a to-do list based on its todoid
SELECT * FROM todo_entries WHERE listid = ?;
You can also combine the three queries into one:
SELECT u.*, l.*, e.* -- or select appropriate columns from the three tables
FROM todo_users u
LEFT JOIN todo_lists l ON l.userid = u.id
LEFT JOIN todo_entries e ON e.listid = l.id
WHERE u.username = ?;
Use of the LEFT JOINs means that you will also get data for users without lists or lists without entries (but column values will be NULL).
Inserting, updating and deleting records can be done with very similar statements and similarly fast.
PostgreSQL stores data on "pages" (typically 4kB in size) and most pages will be filled, which is a good thing because reading a writing a page are very slow compared to other operations.
Scenario 2
In this scenario you need only two tables per user (todo_lists and todo_entries) but you need some mechanism to identify which tables to query.
1 million todo_lists tables with a few records each
1 million todo_entries tables with a few dozen records each
The only practical solution to that is to construct the full table names from a "basename" related to the username or some other persistent authentication data from your web site. So something like this:
username = 'Jerry';
todo_list = username + '_lists';
todo_entries = username + '_entries';
And then you query with those table names. More likely you will need a todo_users table anyway to store personal data, usernames and passwords of your 1 million users.
In most cases the tables will be very small and PostgreSQL will not use any indexes (nor does it have to). It will have more trouble finding the appropriate tables, though, and you will most likely build your queries in code and then feed them to PostgreSQL, meaning that it cannot optimize a query plan. A bigger problem is creating the tables for new users (todo_list and todo_entries) or deleting obsolete lists or users. This typically requires behind-the scenes housekeeping that you avoid with the previous scenario. And the biggest performance penalty will be that most pages have only little content so you waste disk space and lots of time reading and writing those partially filled pages.
Scenario 3
This scenario is even worse that scenario 2. Don't do it, it's madness.
3 million tables todo_entries with a few records each
So...
Stick with option 1. It is your only real option.

what is the best table structure for keeping several combo box(list box)

I have several list boxes in my web application that user has to fill. Administrator can add/remove/edit values in the combo box from controle panel. so problem is what is the best way to keep these combo box in database.
one way is keeping each table for each combo box. I think this is very easy to handle but I will have to create more than 20 tables for each combo/list box.And I think whether it is good practice to do so.
anotherway is keeping one table for all combo box. But I am worring when deleting data in this case.
If I want to remove India from countr coloum in combo box table, then I will be in problem. I may have to update it to null or some otherway and have to handel this in programming side.
Am I correct. can you help me ?
I think you just should create a table with 3 fields. First field is the id, second is the name and the last is the foreign key. For example:
combo_box_table
id - name - box
1 - Japan - 1
2 - India - 1
3 - Scotland - 2
4 - England - 3
you just have to play with query, each box represent the last field. 1 represent combo box 1 and 2 represent combo box 2 etc.
select * from combo_box_table where box = 1
if you want to delete india the query is just delete from combo_box_table where id = 2
May this help
Another possibility would be to save the combo box data as an array or a json string in a single field in your table, but whether you want to do this or not depends on how you want your table to function and what you application is. See Save PHP array to MySQL? for further information.
EDIT:
I'm going to assume you have a combo-box with different countries and possibly another with job titles and others.
If you create multiple tables then yes you would have to use multiple SQL querys, but the amount of data in the table would be flexible and deleting would be a one step process:
mysqli_query($link,"DELETE FROM Countries WHERE Name='India'");
With the json or array option you could have one table, and one column would be each combo-box. This would mean you only have to query the table once to populate the combo-boxes, but then you would have to decode the json strings and iterate through them also checking for null values for instance if countries had 50 entries but job titles only had 20. There would be some limitations on data amount as the "text" type only has a finite amount of length. (Possible, but a nightmare of code to manage)
You may have to query multiple times to populate the boxes, but I feel that the first method would be the most organized and flexible, unless I have mis-interpreted your database structure needs...
A third possible answer, though very different, could be to use AJAX to populate the combo-boxes from separate .txt files on the server, though editing them and removing or adding options to them through any way other than manually opening the file and typing in it or deleting it would be complex as well.
Unless you have some extra information at the level of the combo-box itself, just a simple table of combo-box items would be enough:
CREATE TABLE COMBO_BOX_ITEM (
COMBO_BOX_ID INT,
VALUE VARCHAR(255),
PRIMARY KEY (COMBO_BOX_ID, VALUE)
)
To get items of a given combo-box:
SELECT VALUE FROM COMBO_BOX_ITEM WHERE COMBO_BOX_ID = <whatever>
The nice thing about this query is that it can be satisfied by a simple range scan on the primary index. In fact, assuming the query optimizer of your DBMS is clever enough, the table heap is not touched at all, and you can eliminate this "unnecessary" heap by clustering the table (if your DBMS supports clustering). Physically, you'd end-up with just a single B-Tree representing the whole table.
Use a single table Countries and another for Job Descriptions setup like so:
Countries
ID | Name | JobsOffered | Jobs Available
_________________________________________
1 | India | 1,2,7,6,5 | 5,6
2 | China | 2,7,5, | 2,7
etc.
Job Descriptions
ID | Name | Description
___________________________________
1 | Shoe Maker | Makes shoes
2 | Computer Analyst | Analyzes computers
3 | Hotdog Cook | Cooks hotdogs well
Then you could query your database for the country and get the jobs that are available (and offered) then simply query the Job Description table for the names and display to the user which jobs are available. Then when one job is filled or is opened all you have to do is Update the contry table with the new jobID.
Does this help? (In this case you will need a separate table for each combo-box, as suggested, and you have referencing IDs for the jobs available)

Athletics Ranking Database - Number of Tables

I'm fairly new to this so you may have to bear with me. I'm developing a database for a website with athletics rankings on them and I was curious as to how many tables would be the most efficient way of achieving this.
I currently have 2 tables, a table called 'athletes' which holds the details of all my runners (potentially around 600 people/records) which contains the following fields:
mid (member id - primary key)
firstname
lastname
gender
birthday
nationality
And a second table, 'results', which holds all of their performances and has the following fields:
mid
eid (event id - primary key)
eventdate
eventcategory (road, track, field etc)
eventdescription (100m, 200m, 400m etc)
hours
minutes
seconds
distance
points
location
The second table has around 2000 records in it already and potentially this will quadruple over time, mainly because there are around 30 track events, 10 field, 10 road, cross country, relays, multi-events etc and if there are 600 athletes in my first table, that equates to a large amount of records in my second table.
So what I was wondering is would it be cleaner/more efficient to have multiple tables to separate track, field, cross country etc?
I want to use the database to order peoples results based on their performance. If you would like to understand better what I am trying to emulate, take a look at this website http://thepowerof10.info
Changing the schema won't change the number of results. Even if you split the venue into a separate table, you'll still have one result per participant at each event.
The potential benefit of having a separate venue table would be better normalization. A runner can have many results, and a given venue can have many results on a given date. You won't have to repeat the venue information in every result record.
You'll want to pay attention to indexes. Every table must have a primary key. Add additional indexes for columns you use in WHERE clauses when you select.
Here's a discussion about normalization and what it can mean for you.
PS - Thousands of records won't be an issue. Large databases are on the order of giga- or tera-bytes.
My thought --
Don't break your events table into separate tables for each type (track, field, etc.). You'll have a much easier time querying the data back out if it's all there in the same table.
Otherwise, your two tables look fine -- it's a good start.

Resources