I have been asked to review a series of indexes before they get added to our 2019 SQL db. I am not an index person and do not normally do this so I am trying to understand why I should add the one below.
The code below is adding two columns to a non-clustered index, when those two columns are always null. Essentially, I have a Part Master table that has 3 "sub category" fields. We only use 1 of those subcategory fields and never plan on using the others, so part_subgrp1 is populated while part_subgrp2 & 3 are always null. My peer keeps saying it will improve performance but they cannot explain why and I can't seem to get past how two blank fields help optimize search (at least in a meaningful way).
Here is the code:
5. new index from table partmstr
CREATE NONCLUSTERED INDEX [idx_partmstr_sellable_inclPartSubgrp2] ON [dbo].[partmstr]
(
[sellable], [part_grp]
)
include ([part_subgrp2],[part_subgrp3] )
GO
I don't know that it matters for the question but the table has 343 columns and 7704 records (and confirmed those two part sub groups are null for every one of those).
If he is correct I just want a better understanding of why so I don't feel like I am just putting junk in.
Thank you!
I am new, so if I jacked this up I will totally correct it. Thanks!
Related
I tried asking this question before and it seemed to have gotten swept under the rug.
First thing first, here are these two pictures to show the table structure and the current output I get in SSIS.
Table Diagram
Current Output
So in table three, there is only one entry. This entry (name) applies to the other foreign keys though. What I want the final output to look like is like my current output, but instead of the NULLS, there should just be ones.
I was able to get this far on my own through researching and learning about the merge transformations but I can't seem to find anything on manipulating the data in the way that I want.
I greatly appreciate any tips or advice you can offer.
EDIT: Since the images can't be seen apparently, I will try and describe them.
The table diagram has four tables, the top one in the waterfall has a primary key formed from the three foreign keys for the three different tables.
Trying to accomplish filling out this table in SSIS, my output has each foreign key id from the first two tables, but only one in the third table. The rest from the third foreign key are all NULLS. I believe this is because there is only one entry in that table for now, but this entry applies to all of the foreign key ids and so it should be repeating.
It should look like this:
ID1 ID2 ID3
1 1 1
2 2 1
3 3 1
But instead, I am only getting nulls in the ID3 field after the first record. How do I make the single id repeat in ID3?
EDIT 2: Some additional screenshots of my data flow and merge transformation as requested.
[![SSIS Dataflow][3]][3]
After working on this for a few weeks, and with a tips from a colleague, a solution to this question was found. Surprisingly, it was quite simple and I'm slightly shocked that no one on here could provide the answer.
The solution was simply this; Using a data source, write the following SQL code in the data access mode (SQL Command):
SELECT a.T1ID,
b.T2ID,
c.T3ID
FROM Table1 AS a join
Table2 AS b
On a.T1ID = b.T2ID,
Table3 AS c
ORDER BY a.[T1ID] ASC
If Table3 will always have just a single row, the simplest solution would be to use an Execute SQL task to save the T3id to a variable (Control Flow), then use a Derived Column task (Data Flow) to add the variable as a new column.
If that won't work for you (or your data), you can take a look here to see how to fudge the Merge Join task to do what you want.
I would like your opinion. I have a table with 120 VARCHAR fields where I will have to hire about 1,000 records per month for at least 10 years, with a total number of 240,000 records.
I could divide the fields into multiple tables but I'd rather keep it that way. Do you think I will have problems in the future?
Thank you
Well, if the data of the columns is following a certain logic, keep it flat. Which means that I would let it that way. Otherwise separate it into multiple tables. I depents on your data.
I worked once worked with medical data where one table contained over 100 columns, but all these columns where needed to get a diagnostic result. I don't remember, what exactly it was, because I worked with that data set some years ago. But in that case it would make it more complicated, if the columns would be separated into multiple columns. Logically the data of each column served a certain purpose so it was easier to have them all in the same place (the table).
If you put the columns all together just to be lazy, so that you have to call the table once, I would recommend to separate the columns into different tables to make it more comfortable to work with, and to make the database schema more understandable.
Can someone help me get into the thinking of knowing how to fix data in SQL tables (by trying NOT to give me an SQL routines I could run).
Ok, this is the situation…. Suppose I have a single table with has a column called ColumnA which has lots of duplicate values. I need to remove all the duplicate entries from the table in question. Question is….if I had to write pseudo-code as a plan, what SQL should be written
Many thanks to anyone who can offer me any pointers.
Kind Regards
James
I think you've already articulated a very basic psuedocode for the issue you describe in stating that you wish to delete duplicate values from column A.
For this example, I would tend to;
Find all instances of duplicates
Work out method of determining which one to keep (Google "MAX N in
Group" for ideas) There are good articles here on SO and DBA
Stackexchange also other external articles with examples
Write your delete to cater for the records you identify as unwanted
duplicates in the previous step
For me, when working through these types of issues in SQL Server, I tend to write a series of Common Table Expressions (CTE) to identify my target records and then delete based on that.
For example;
;WITH Duplicates AS (
-- Write your select query to identify which subset of your records are affected by having duplicate values in Column A
), TargetRows AS (
-- Further select from Duplicates some method of MAX N in Group to identify which of the rows are unwanted
) -- Then here DELETE from your table based upon your findings from above
Say we have a fruits that that is having a high number of reads but also inserts though almost not update nor delete.
We have 2 columns that stores values that have a small number of options. Lets say categories[Banana, apple, orange or pear] and status[finished, ongoing, spoiled, destroyed or ok].
Finally, we have a column last name of owner.
Notes:
I am going to searchs sometimes by category and other by status.
In all cases, lastname will be used for the search.
I will always perform exact match on categories/status but start with in last name.
Ex of common queries:
SELECT * FROM fruit_table WHERE category='BANANA' and last_name LIKE 'Cool%'
SELECT * FROM fruit_table WHERE status='Spoiled' and last_name LIKE 'Co%'
SELECT * FROM fruit_table WHERE category='BANANA' and last_name LIKE 'smith%'
How can I prepare it so we have low response time? Will a index help(taking into account that the values in the column are not disperse at all)?Might bitmap index help here? What about partitioning?
Finally, Apologies about the title, I did not know how to formulate it properly.
Bitmap indexes help immensely with items that have a limited number
of available choices.
A standard b-tree index (or non-clustered in SQL Server) will work well
for the last_name column.
I would do those two first, as they are easy and then see how things work.
It is generally a bad practice to prematurely optimize. However, adding indices is quick way to increase speed without much effort. For more information on indices in Oracle, read this question.
I need to store multiple 4 letters strings for each database row but the amount of 4 letter strings could be different every time.
So would it be easier to setup a new table and add a new row for each 4 letter string with the id of the related row in the other table ?
For normalisation reasons and performance as well as being able to later perform efficient queries, you would want to store it in a related table.
Main : ID, other columns
Related : Main_ID, 4-letter-string
If there is nothing else you will store in the Main table, then just store them as multiple rows, and relate via a common ID.
You can store it on one record and still search efficiently, if FULLTEXT searching is turned on, but I doubt your 4-letter strings are natural language words, so it may not suit as well.