SQL Server - Query String Greater Or Equal To - sql-server

I am attempting to optimise a query in my application that is causing problems when scaling my application.
The table contains two columns: FROM and TO which each contain values. Here is an example:
Row | From | To
1 | AA | Z
2 | B | C
3 | JA | JZ
4 | JM | JZ
The query is passed a name (JOHN) and should return a list of ranges from the table that could contain the name.
select * from Ranges where From <= 'JOHN' and To >= 'JOHN'
Using the table above this would result in rows 1 and 3 being returned.
The problem I am having is one of query consistency.
All indexes are in place but if I search for JOHN the query returns in 20 milliseconds, whereas MARK returns in 250 milliseconds.
Looking at query analyzer shows me that JOHN is actually searching for more rows than MARK but I'm struggling to understand how or why MARK takes so long.
If the time difference was 20 - 40 milliseconds, I could live with that but 250 is so large a difference that the overall performance of my application is terrible.
Does anybody have any idea how I could narrow down why I get such variance in my queries OR a better way of storing and searching for string ranges (which could contains letters and numbers).
Many thanks in advance.
EDIT - One thing I forgot to mention was that the original table contains approximately 15 million rows (its actually postcodes).

Related

Same data set - More rows with less columns or Less rows with more columns [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Database - MariaDB
The requirement is to store end of day closing balance for different customer for different accounts types (around 15 account types)
Approach 1 (50 Million rows X 10 Columns)
Name | Date | Amount | Account
A | 1 Jan| 100 | Saving
A | 1 Jan| 200 | Current
B | 1 Jan| 300 | Saving
Approach 2 (10 Million rows x 25 columns)
Name | Date | Saving | Current
A | 1 Jan| 100 | 200
B | 1 Jan| 300 | 0
Index will be on Name Date for both tables.
Approach 1 will have index on Account column as well
Query
Each query will have where clause on Name and Date
The size difference between the two designs is within an order of magnitude, and really within a fairly small factor. Size aside, the first design is probably more desirable for several reasons.
First, if you at some later point need to support a third or fourth account type, you can simply keep adding new rows with the same table structure. On the other hand, adding support for a new account type to the second design requires changing the structure of the table.
From a reporting point of view, it is probably advantageous to have the account type as a single separate column. This makes aggregation and filter queries easy to write, and with the right index, these queries can be made fast. The second design forces you to hardwire your queries to separate columns for each account type. This can end up being debt later on should the table design need to change.

SSAS - MDX calculated member

I've a fact table that details individual line amounts for orders placed by my organisation. In this fact, at line level, I've included the total order amount to be used, as it's possible we might need that level of detail at some point.
Here's an example of what I've got:-
+------------+------------+---------------+------------+---------------------+
| BookingKey | Booking_ID | Category_FKey | Line_Value | Total_Booking_Value |
+------------+------------+---------------+------------+---------------------+
| 1 | 12 | 8 | 150 | 700 |
| 2 | 12 | 4 | 150 | 700 |
| 3 | 12 | 5 | 300 | 700 |
| 4 | 12 | 4 | 100 | 700 |
+------------+------------+---------------+------------+---------------------+
As you can see, the Total_Booking_Value here is the sum of the Line_Value for the booking in the example (Booking_ID = 12).
The Category_FKey looks up to a Categories dimension.
Using this structure I've created a simple cube and this works fine, mainly.
The issue I have is that I'd like to be able to view the Total Line_Value amount, and somehow include the Total_Booking_Value alongside it.
So, for example I might add the Categories dimension as a filter and want to filter by say Category_FKey = 4.
If this was the case I'd want the aggregates to tell me that the total Line_Value was 250 (for BookingKeys 2 and 4), and the Total_Booking_Value should be 700. Using normal aggregation (ie SUM) I'm getting the Total_Booking_Value as 1400 (obviously - because it's adding 700 * 2 for the two rows the cube would return).
So, the way I see it I'd like to create an MDX calculation that somehow takes the Total_Booking_Value and gives just the value for the Booking in question.
Should this be done using some kind of average, or division by the Distinct number of items? I can't figure this out. I tried something like this:-
create member currentcube.measures.[Calculated Booking Value]
as
[Measures].[Total_Booking_Value] / count(Measures.Booking_ID);
But this isn't working.
Hopefully this makes sense and you can point me in the right direction.
I find it strange that booking_ID is a measure - intuitively it strikes me as something that would be an attribute and therefore a hierarchy - in which case you'd be able to do the count like this:
[Measures].[Total_Booking_Value]
/
COUNT(EXISTING [Booking].[Booking_ID].[Booking_ID].members)
A straightforward solution would be to have two fact tables: one with granularity booking key and one with granularity booking id. The first would contain all columns except total booking value, and the second would contain columns booking id and total booking value.
Then each of both measures would easily be summable.
The reference type between the second fact table and the category dimension could be configures as many-to-many via the first fact table. Thus, you would see the full values of the involved bookings for each selected category, automatically eliminating double counting.

Optimal View Design To Find Mismatches Between Two Sets of Data

A bit of background...my company utilizes a piece of software that stores information about a mortgage loan in independent fields. These fields are broken up across many tables in the loan database.
My current dilemma revolves around designing a view(s) that will allow me to find mismatched data on a subset of loans from the underwriting side of our software and the lock side of our software.
Here is a quick example of the data returned from the two views that already exist:
UW View
transID | DTIField | LTVField | MIField
50000 | 37.5 | 85.0 | 1
Lock View
transID | DTIField | LTVField | MIField
50000 | 42.0 | 85.0 | 0
In the above situation, the view should return the fields that are not matching (in this case the DTIField and the MIField). I have built a comparison view that uses a series of CASE statements to return either a 0 for not matched or a 1 for matched already:
transID | DTIField | LTVField | MIField
50000 | 0 | 1 | 0
This is fine in itself but it is creating a bit of an issue downstream on the reporting side. We want to be able to build a report that would display only those transIDs that have mismatched data and show which columns are not matched. Crystal Reports is the reporting solution in question.
Some specifics about the data sets...we have 27 items of the loan that we are comparing (so a total 54 fields). There are over 4000 loans in the system and growing. There are already indexes on the transID fields.
How would you structure the view to return all the data needed for the report? We can do a good amount of work in Crystal Reports but ideally much of the logic would be handled in MSSQL.
Thanks for any assistance.
I think there should be no issue in comparing the 27 columns for a given row. Since you'll be reading the row just once and comparing the columns on that row in both the tables, it shouldn't really pose any performance issues. You can use some hash functions HASHBYTES to assign a hash value to the combination of these 27 fields in both the tables and then use this field to compare which rows should be returned by the view. This should result in some performance improvement. Testing will reveal more.

Retrieving data from 2 tables that have a 1 to many relationship - more efficient with 1 query or 2?

I need to selectively retrieve data from two tables that have a 1 to many relationship. A simplified example follows.
Table A is a list of events:
Id | TimeStamp | EventTypeId
--------------------------------
1 | 10:26... | 12
2 | 11:31... | 13
3 | 14:56... | 12
Table B is a list of properties for the events. Different event types have different numbers of properties. Some event types have no properties at all:
EventId | Property | Value
------------------------------
1 | 1 | dog
1 | 2 | cat
3 | 1 | mazda
3 | 2 | honda
3 | 3 | toyota
There are a number of conditions that I will apply when I retrieve the data, however they all revolve around table A. For instance, I may want only events on a certain day, or only events of a certain type.
I believe I have two options for retrieving the data:
Option 1
Perform two queries: first query table A (with a WHERE clause) and store data somewhere, then query table B (joining on table A in order to use same WHERE clause) and "fill in the blanks" in the data that I retrieved from table A.
This option requires SQL Server to perform 2 searches through table A, however the resulting 2 data sets contain no duplicate data.
Option 2
Perform a single query, joining table A to table B with a LEFT JOIN.
This option only requires one search of table A but the resulting data set will contain many duplicated values.
Conclusion
Is there a "correct" way to do this or do I need to try both ways and see which one is quicker?
Ex
Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id
and a subquery something like this -
Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept)
When I consider performance which of the two queries would be faster and why ?
would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...).
As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.
The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!
More REF

Ranking of Full Text Search (SQL Server)

For the last couple hours I have been messing with all sorts of different variations of SQL Server full text search. However I am still unable to figure out how the ranking works. I have come across a couple examples that really confuse me as to how they rank higher then others. For example
I have a table with 5 cols + more that are not indexed. All are nvarchar fields.
I am running this query (Well almost.. I retyped with different names)
SET #SearchString = REPLACE(#Name, ' ', '*" OR "') --Splits words with an OR between
SET #SearchString = '"'+#SearchString+'*"'
print #SearchString;
SELECT ms.ID, ms.Lastname, ms.DateOfBirth, ms.Aka, ms.Key_TBL.RANK, ms.MiddleName, ms.Firstname
FROM View_MemberSearch as ms
INNER JOIN CONTAINSTABLE(View_MemberSearch, (ms.LastName, ms.Firstname, ms.MiddleName, ms.Aka, ms.DateOfBirth), #SearchString) AS KEY_TBL
ON ms.ID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 0
ORDER BY KEY_TBL.RANK DESC;
Thus if I search for 11/05/1964 JOHN JACKSON I would get "11/05/1964" OR "JOHN*" OR "JACKSON*" and these results:
ID -- First Name -- Middle Name -- Last Name -- AKA -- Date of Birth -- SQL Server RANK
----------------------------------------------------------------------------------
1 | DAVE | JOHN | MATHIS | NULL | 11/23/1965 | 192
2 | MARK | JACKSON | GREEN | NULL | 05/29/1998 | 192
3 | JOHN | NULL | JACKSON | NULL | 11/05/1964 | 176
4 | JOE | NULL | JACKSON | NULL | 10/04/1994 | 176
So finally my question. I don't see how row 1 and 2 are ranked above row 3 and why row 3 is ranked the same as row 4. Row 2 should have the highest rank by far seeing as the search string matches the First name and Last Name as well as the Date of birth.
If I change the OR to AND I don't get any results.
I've found AND and OR clauses don't apply across columns. Create an indexed view that merges the columns and you'll get better results. Look at my past questions and you'll find information that suites your scenario.
I also have found I'm better off not appending a '*'. I thought it'd turn up more matches, but it tended to return worse results (particularly for long words). As a middle ground you might only append a * to longer words.
The example case you give is definately weird.
It's not entirely equivalent, but perhaps this question I asked (How-to: Ranking Search Results) could be of assistance?
What happens if you remove the DoB criteria?
MS Full-Text search is really really a black box that's hard to understand and customize
You pretty much take it AS IS, unlike Lucene is great for customization
Thank you guys.
Frank you were correct that AND and OR do not go across columns this was something I did not notice at first.
To get the best results I had to merge all 5 columns into 1 column in a view. Then search on that single column. Doing so gave me the exact results I wanted without any extras.
My actual search string after converting it ended up being "Word1*" AND "Word2*"
Using the % sign still did not do what msdn said it should do. Meaning if I searched for the word josh and it got changed into "Josh%" when I searched then "Joshua" would not be found. Pretty dumb however with "Josh*" then joshua would be found.

Resources