I need to build one MSSQL query that selects one row that is the best match.
Ideally, we have a match on street, zip code and house number.
Only if that does not deliver any results, a match on just street and zip code is sufficient
I have this query so far:
SELECT TOP 1 * FROM realestates
WHERE
(Address_Street = '[Street]'
AND Address_ZipCode = '1200'
AND Address_Number = '160')
OR
(Address_Street = '[Street]'
AND Address_ZipCode = '1200')
MSSQL currently gives me the result where the Address_Number is NOT 160, so it seems like the 2nd clause (where only street and zipcode have to match) is taking precedence over the 1st. If I switch around the two OR clauses, same result :)
How could I prioritize the first OR clause, so that MSSQL stops looking for other results if we found a match where the three fields are present?
The problem here isn't the WHERE (though it is a "problem"), it's the lack of an ORDER BY. You have a TOP (1), but you have nothing that tells the data engine which row is the "top" row, so an arbitrary row is returned. You need to provide logic, in the ORDER BY to tell the data engine which is the "first" row. With the rudimentary logic you have in your question, this would like be:
SELECT TOP (1)
{Explicit Column List}
realestates
WHERE Address_Street = '[Street]'
AND Address_ZipCode = '1200'
ORDER BY CASE Address_Number WHEN '160' THEN 1 ELSE 2 END;
You can't prioritize anything in the WHERE clause. It always results in ALL the matching rows. What you can do is use TOP or FETCH to limit how many results you will see.
However, in order for this to be effective, you MUST have an ORDER BY clause. SQL tables are unordered sets by definition. This means without an ORDER BY clause the database is free to return rows in any order it finds convenient. Mostly this will be the order of the primary key, but there are plenty of things that can change this.
Related
Using MS Access and I have two tables, one is categories and the other is content.
My initial SQL statement, included below,takes a count of each content associated to a category and returns the count associated with each category.
So for each CATEGORY, I'm simply trying to return another count in which I count CONTENT that have a specific user level and are not deleted for each CATEGORY.
Below is what I am struggling with as I am not certain you can actually use COUNT like this.
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
This is the full select statement with my addition but not working.
SELECT
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
Last(CONTENT.contentDate) AS contentDate,
CATEGORY.isDeleted AS categoryDeleted,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) AS countTotal,
COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null)) AS countDeleted,
COUNT([CONTENT.categoryId]) - COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null))AS countDifference,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
FROM CATEGORY
LEFT JOIN CONTENT ON
CATEGORY.categoryId = CONTENT.categoryId
GROUP BY
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
CATEGORY.isDeleted
HAVING (((CATEGORY.isDeleted)=0))
ORDER BY
CATEGORY.categoryTitle
you should be able to use the following
SUM(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2,1,NULL)) AS userLevelCount
COUNT will not count NULL, but it will count zero. SUM will calculate the sum of all 1's - that's a second way of achieving the same.
IIF exists in the newer SQL versions
I believe I found the solution
Count(IIf([CONTENT.userLevel]=2,[CONTENT.contentID],Null)) AS countDifference2
This will return the count difference for CONTENT for each CATEGORY that isn't deleted and has a specific user level.
I have something of a Search App. There are 7 fields (first name, last name, phone, street, city, shop number, credit card number) where user can write parameters and it's gonna find him clients in the database. Everything is working with AND condition, so when first name is 'Andy' and last name is 'Larkin' is only gonna find Andy Larkins etc. User can leave a field empty, that means when first name is 'Andy' then it should find all the Andys etc. Database looks like this:
The 'Relation' table is to connect person and a shop. Person must have 1 address, 1 shop, can have multiple addresses, multiple shops and no credit card/multiple credit cards. Now, I have to handle all the filtering in a single query, I can't check some conditions before and then construct the query another way, I just don't have that option.
When I search by first name or last name it's fast (both in Person table), but when I search by phone number, or credit card number - it's taking a lot of time. There is a lot of data in the database, but still, my query is bad, I'm not really good at writing queries, especially in Oracle. Here's the query:
SELECT
PERSON.personId,
PERSON.firstName,
PERSON.lastName
ADDRESS.street,
ADDRESS.city,
ADDRESS.phoneNumber
FROM
PERSON
LEFT JOIN ADDRESS ON PERSON.personId = ADDRESS.personId,
LEFT JOIN RELATION ON PERSON.personId = RELATION.personId,
LEFT JOIN SHOPS ON RELATION.shopId = SHOPS.shopId
LEFT JOIN CREDITCARDS ON PERSON.personId = CREDITCARDS.personId
WHERE
PERSON.firstName = NVL(?, PERSON.firstName),
PERSON.lastName = NVL(?, PERSON.lastName),
ADDRESS.phoneNumber = NVL(?, ADDRESS.phoneNumber),
ADDRESS.street = NVL(?, ADDRESS.street),
ADDRESS.city = NVL(?, ADDRESS.city),
SHOPS.shopNumber = NVL(?, SHOPS.shopNumber),
CREDITCARDS.creditCardNumber = NVL(?, CREDITCARDS.creditCardNumber);
The parameters that user left empty are passed as NULLS, that's why I use NVL. When I delete all conditions and leave let's say a credit card number, then it's fast, so I guess that means that all the unnecessary condition checking is slowing the query, and I don't really need that condition checking in most cases, it's just there in case a user passes something.
If I would have the option to check for conditions and only then construct a query then I would just add the conditions that are needed, but I don't have that option. I was thinking about adding some 'IFs' in the query, but I'm not sure that's even possible, all I could find was 'IF/CASE WHEN' but couldn't find any examples that apply to my case. I also tried this:
...WHERE (? IS NULL OR (PERSON.firstName = NVL(?, PERSON.firstName))) AND...
That didn't help, and I got tons of duplicated (different only in address or something - person can have multiple addresses) results (even with 'DISTINCT').
It's not homework, that database is huge with lot of other fields, but I simplified it here, there is also a lot of data there. Thanks for help.
A few things to think about here.
Be careful about queries that might not make sense; such as those that query a credit card number and an address. Queries of that nature fall into a fan trap.
Creating referential integrity constraints in the database will allow the optimizer to do join elimination.
It would be much better for the optimizer, if you could build the query "where clause" dynamically, rather than using NVL functions.
A nested select on the shops might improve performance especially considering it's outer joined. The query below should be enough to get you the idea.
Regarding de-duplication- it's hard because you are selecting Id's and 'distinct' won't help. You'd probably have to use the group by syntax and that might slow the query even more.
If sorting can be done on the client it might help with performance. If the amount of data being returned is significant due to fan-out of relational data and group by isn't a good option then creating a stored procedure might be the best option so most the work is done on database and minimal data over the wire.
SELECT
p.personId,
p.firstName,
a.city,
a.phoneNumber,
shop.shopNumber
FROM
PERSON p,
ADDRESS a,
CREDITCARDS c,
(select ss.personId, ss.shopId, ss.shopNumber from shop s, relation r
where s.shopId = r.shopId) as shop
WHERE
p.personId = a.personId AND
p.personId = c.personId AND
p.personId = shop.personId (+)
my questions are What indexes are used? In what order? Why? in following sample
Query:
SELECT House
FROM myTable
WHERE 1=1
and City='myCity'
and Street='myStreet'
and Color='myColor'
Indexes:
Ind1: City
Ind2: Street
Ind3: Color
Ind4: Street,Color
It depends on... The server might have statistics, so it will choose the index which has the most effective filtering like:
if City='myCity' returns 100
if Street='myStreet' returns 1000
if Color='myColor' returns 10000
element, then City index will be used. This logic is valid for composite indexes as well.
The optimizer will try to get the smallest set first then the other filters will be applied on this.
This requires uptodate statistic, otherwise the wrong index might be used.
I'm using the lsqlite3 lua wrapper and I'm making queries into a database. My DB has ~5million rows and the code I'm using to retrieve rows is akin to:
db = lsqlite3.open('mydb')
local temp = {}
local sql = "SELECT A,B FROM tab where FOO=BAR ORDER BY A DESC LIMIT N"
for row in db:nrows(sql) do temp[row['key']] = row['col1'] end
As you can see I'm trying to get the top N rows sorted in descending order by FOO (I want to get the top rows and then apply the LIMIT not the other way around). I indexed the column A but it doesn't seem to make much of a difference. How can I make this faster?
You need to index the column on which you filter (i.e. with the WHERE clause). THe reason is that ORDER BY comes into play after filtering, not the other way around.
So you probably should create an index on FOO.
Can you post your table schema?
UPDATE
Also you can increase the sqlite cache, e.g.:
PRAGMA cache_size=100000
You can adjust this depending on the memory available and the size of your database.
UPDATE 2
I you want to have a better understanding of how your query is handled by sqlite, you can ask it to provide you with the query plan:
http://www.sqlite.org/eqp.html
UPDATE 3
I did not understand your context properly with my initial answer. If you are to ORDER BY on some large data set, you probably want to use that index, not the previous one, so you can tell sqlite to not use the index on FOO this way:
SELECT a, b FROM foo WHERE +a > 30 ORDER BY b
Is it possible to have selective queries in PostgreSQL which select different tables/columns based on values of rows already selected?
Basically, I've got a table in which each row contains a sequence of two to five characters (tbl_roots), optionally with a length field which specifies how many characters the sequence is supposed to contain (it's meant to be made redundant once I figure out a better way, i.e. by counting the length of the sequences).
There are four tables containing patterns (tbl_patterns_biliteral, tbl_patterns_triliteral, ...etc), each of which corresponds to a root_length, and a fifth table (tbl_patterns) which is used to synchronise the pattern tables by providing an identifier for each row—so row #2 in tbl_patterns_biliteral corresponds to the same row in tbl_patterns_triliteral. The six pattern tables are restricted such that no row in tbl_patterns_(bi|tri|quadri|quinqui)literal can have a pattern_id that doesn't exist in tbl_patterns.
Each pattern table has nine other columns which corresponds to an identifier (root_form).
The last table in the database (tbl_words), contains a column for each of the major tables (word_id, root_id, pattern_id, root_form, word). Each word is defined as being a root of a particular length and form, spliced into a particular pattern. The splicing is relatively simple: translate(pattern, '12345', array_to_string(root, '')) as word_combined does the job.
Now, what I want to do is select the appropriate pattern table based on the length of the sequence in tbl_roots, and select the appropriate column in the pattern table based on the value of root_form.
How could this be done? Can it be combined into a simple query, or will I need to make multiple passes? Once I've built up this query, I'll then be able to code it into a PHP script which can search my database.
EDIT
Here's some sample data (it's actually the data I'm using at the moment) and some more explanations as to how the system works: https://gist.github.com/823609
It's conceptually simpler than it appears at first, especially if you think of it as a coordinate system.
I think you're going to have to change the structure of your tables to have any hope. Here's a first draft for you to think about. I'm not sure what the significance of the "i", "ii", and "iii" are in your column names. In my ignorance, I'm assuming they're meaningful to you, so I've preserved them in the table below. (I preserved their information as integers. Easy to change that to lowercase roman numerals if it matters.)
create table patterns_bilateral (
pattern_id integer not null,
root_num integer not null,
pattern varchar(15) not null,
primary key (pattern_id, root_num)
);
insert into patterns_bilateral values
(1,1, 'ya1u2a'),
(1,2, 'ya1u22a'),
(1,3, 'ya12u2a'),
(1,4, 'me11u2a'),
(1,5, 'te1u22a'),
(1,6, 'ina12u2a'),
(1,7, 'i1u22a'),
(1,8, 'ya1u22a'),
(1,9, 'e1u2a');
I'm pretty sure a structure like this will be much easier to query, but you know your field better than I do. (On the other hand, database design is my field . . . )
Expanding on my earlier answer and our comments, take a look at this query. (The test table isn't even in 3NF, but the table's not important right now.)
create table test (
root_id integer,
root_substitution varchar[],
length integer,
form integer,
pattern varchar(15),
primary key (root_id, length, form, pattern));
insert into test values
(4,'{s,ş,m}', 3, 1, '1o2i3');
This is the important part.
select root_id
, root_substitution
, length
, form
, pattern
, translate(pattern, '12345', array_to_string(root_substitution, ''))
from test;
That query returns, among other things, the translation soşim.
Are we heading in the right direction?
Well, that's certainly a bizarre set of requirements! Here's my best guess, but obviously I haven't tried it. I used UNION ALL to combine the patterns of different sizes and then filtered them based on length. You might need to move the length condition inside each of the subqueries for speed reasons, I don't know. Then I chose the column using the CASE expression.
select word,
translate(
case root_form
when 1 then patinfo.pattern1
when 2 then patinfo.pattern2
... up to pattern9
end,
'12345',
array_to_string(root.root, '')) as word_combined
from tbl_words word
join tbl_root root
on word.root_id = root.root_id
join tbl_patterns pat
on word.pattern_id = pat.pattern_id
join (
select 2 as pattern_length, pattern_id, pattern1, ..., pattern9
from tbl_patterns_biliteral bi
union all
select 3, pattern_id, pattern1, pattern2, ..., pattern9
from tbl_patterns_biliteral tri
union all
...same for quad and quin...
) patinfo
on
patinfo.pattern_id = pat.pattern_id
and length(root.root) = patinfo.pattern_length
Consider combining all the different patterns into one pattern_details table with a root_length field to filter on. I think that would be easier than combining them all together with UNION ALL. It might be even easier if you had multiple rows in the pattern_details table and filtered based on root_form. Maybe the best would be to lay out pattern_details with fields for pattern_id, root_length, root_form, and pattern. Then you just join from the word table through the pattern table to the pattern detail that matches all the right criteria.
Of course, maybe I've completely misunderstood what you're looking for. If so, it would be clearer if you posted some example data and an example result.