How can I make Sybase's database engine return an unsorted list of records in non-numeric order?
~~~
I have an issue where I need to reproduce an error in the application where I select from a table where the ID is generated in sequence, but the ID is not the last one in the selection.
Let me explain.
ID STATUS
_____________
1234 C
1235 C
1236 O
Above is 3 IDs. I had code where these would be the results of a
select #p_id = ID from table where (conditions).
However, there wasn't a clause to check for status = 'O' (open). Remember Sybase saves the last returned record into a variable.
~~~~~
I'm being asked to give the testing team something that will make the results not work. If Sybase selects the above in an unordered list, it could appear in ascending order, or, if the database engine needs to change blocks of stored data or something technical magic stuff, the order could be messed up. The original error was when the procedure would return say 1234 instead of 1236.
Is there a way that I can have a 100% guarantee that Sybase will search over a block of data and have to double back, effectively 'breaking' the ascending search, and returning not the last record, but any other one? (all records except the maximum will end up erroring, because they are all 'Closed')
I want some sort of magical SQL code that will make sure things don't search the table in exactly numeric order. Ideally I'd like to not have to change the procedure, as the test team want to see the exact same procedure breaking (as easy as plonking a order by id desc would fudge the results).
If you don't specify an order, there is no way to guarantee the return order of the results. It will be however the index is built - and can depend on the order of insertion, the type of index, and the content of index keys.
It's generally a bad idea to do those sorts of singleton SELECTs. You should always specify a specific record with the WHERE clause, or use a cursor, or TOPn or similar. The problem comes when someone tries to understand your code, because some databases when they see multiple hits take the first value, some take the last value, some take a random value (they call that "implementation-defined"), and some throw an error.
Is this by any chance related to 1156837? :)
Related
I have designed a function that works with an SSRS report. I have a drop down parameter that lists multiple items and only one can be selected. This drop down gets its data from a query/data set, and I added one line of data that says 'All' in it. So the dropdown will look like this:
Item1
Item2
Item3
All
And then in the function, I make one small change in the where clause:
...where (#parameterName = 'All' or table.name = #parameterName)
The problem with this is that table.name has about 50000 rows of data. When the user selects 'All' in the drop down, I would have thought that since the first statement in brackets is true, and that the next statement (after the 'or') should not even be executed. But it causes the query to run for 5-20 minutes and still does not produce any result after that long. If I simply change the where clause to
...where (#parameterName = 'All')
The function runs in less than a second, if the user still selects 'All' from the drop down.
I implement a similar concept with another filter but I guess because the table that that parameter uses is much smaller (about 90 rows), so it doesn't take long.
Is there basically a way to have an optional parameter that is not expensive to calculate?
EDIT: I will add that the parameter is declared as nvarchar(max). Will changing this to something smaller help the query?
What you have there is a catch-all query. Consider adding OPTION (RECOMPILE) to the end of your statement. This'll force the engine to recreate the plan each time it runs the query, meaning it won't use poor choices based on a previous run where your variable has a value like 'Item1'.
For those of you who have actually delt with RETS may be able to give me a hand here. The problem occurs when multiple properties are tied into the RETS data even though the property is sold. Basically what I need is to be able to check the database with the SELECT statement against three fields. The fields in question would be C_StreetName, C_StreetNumber, and C_PostalCode.
To make this clear what I want is some type of way to check for duplicates while gathering the dataset, this can't be done in php because of how the data is returned through the application. So if it finds another record with the same C_StreetName, C_StreetNumber, and C_PostalCode it will remove them from the dataset. Ideally it would be nice if it could also check the Status of the two to find out if one is Expired or Sold before removing them from the data.
I'm not familiar with complex SQL functions, I was looking at the IF statement until I found that can only be used while storing data not the other way around. And the CASE statement but it just doesn't seem like that would work.
If you guys have any suggestions on what I should use I'd appreciate it. Hopefully there is a way to do this and keep in mind this is only one table I am accessing I don't have any Joins.
Thanks in advance.
Here's something to get you going in the right direction. I haven't tested this, and am not sure you can nest a case expression inside max() in mysql.
What this accomplishes is to output one row per unique combination of street name, number and postcode, with a status of 'Expired' or 'Sold' taking precedence over other values. That is, if there's a row with 'Expired' it will be output in preference to non-expired and non-sold, and a row with 'Sold' will be output if it exists, regardless of what other rows exist for that property. The case statement just converts the status codes into something orderable.
select
C_StreetName,
C_StreetNumber,
C_PostalCode,
max(
case status
when 'Expired' then 1
when 'Sold' then 2
else 0
end) as status
group by
C_StreetName,
C_StreetNumber,
C_PostalCode;
Again MSDN does not really explain in plain English the exact difference, or the information for when to choose one over the other.
CHECKSUM
Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.
BINARY_CHECKSUM
Returns the binary checksum value computed over a row of a table or over a list of expressions. BINARY_CHECKSUM can be used to detect changes to a row of a table.
It does hint that binary checksum should be used to detect row changes, but not why.
Check out the following blog post that highlights the diferences.
http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/
Adding info from this link:
The key intent of the CHECKSUM functions is to build a hash index based on an expression or a column list. If say you use it to compute and store a column at the table level to denote the checksum over the columns that make a record unique in a table, then this can be helpful in determining whether a row has changed or not. This mechanism can then be used instead of joining with all the columns that make the record unique to see whether the record has been updated or not. SQL Server Books Online has a lot of examples on this piece of functionality.
A couple of things to watch out for when using these functions:
You need to make sure that the column(s) or expression order is the same between the two checksums that are being compared else the value would be different and will lead to issues.
We would not recommend using checksum(*) since the value that will get generated that way will be based on the column order of the table definition at run time which can easily change over a period of time. So, explicitly define the column listing.
Be careful when you include the datetime data-type columns since the granularity is 1/300th of a second and even a small variation will result into a different checksum value. So, if you have to use a datetime data-type column, then make sure that you get the exact date + hour/min. i.e. the level of granularity that you want.
There are three checksum functions available to you:
CHECKSUM: This was described above.
CHECKSUM_AGG: This returns the checksum of the values in a group and Null values are ignored in this case. This also works with the new analytic function’s OVER clause in SQL Server 2005.
BINARY_CHECKSUM: As the name states, this returns the binary checksum value computed over a row or a list of expressions. The difference between CHECKSUM and BINARY_CHECKSUM is in the value generated for the string data-types. An example of such a difference is the values generated for “DECIPHER” and “decipher” will be different in the case of a BINARY_CHECKSUM but will be the same for the CHECKSUM function (assuming that we have a case insensitive installation of the instance).
Another difference is in the comparison of expressions. BINARY_CHECKSUM() returns the same value if the elements of two expressions have the same type and byte representation. So, “2Volvo Director 20” and “3Volvo Director 30” will yield the same value, however the CHECKSUM() function evaluates the type as well as compares the two strings and if they are equal, then only the same value is returned.
Example:
STRING BINARY_CHECKSUM_USAGE CHECKSUM_USAGE
------------------- ---------------------- -----------
2Volvo Director 20 -1356512636 -341465450
3Volvo Director 30 -1356512636 -341453853
4Volvo Director 40 -1356512636 -341455363
HASHBYTES with MD5 is 5 times slower than CHECKSUM, I've tested this on a table with over 1 million rows, and ran each test 5 times to get an average.
Interestingly CHECKSUM takes exactly the same time as BINARY_CHECKSUM.
Here is my post with the full results published:
http://networkprogramming.wordpress.com/2011/01/14/binary_checksum-vs-hashbytes-in-sql/
I've found that checksum collisions (i.e. two different values returning the same checksum) are more common than most people seem to think. We have a table of currencies, using the ISO currency code as the PK. And in a table of less than 200 rows, there are three pairs of currency codes that return the same Binary_Checksum():
"ETB" and "EUR" (Ethiopian Birr and Euro) both return 16386.
"LTL" and "MDL" (Lithuanian Litas and Moldovan leu) both return 18700.
"TJS" and "UZS" (Somoni and Uzbekistan Som) both return 20723.
The same happens with ISO culture codes: "de" and "eu" (German and Basque) both return 1573.
Changing Binary_Checksum() to Checksum() fixes the problem in these cases...but in other cases it may not help. So my advice is to test thoroughly before relying too heavily on the uniqueness of these functions.
Be careful when using CHECSUM, you may get un-expected outcome. the following statements produce the same checksum value;
SELECT CHECKSUM (N'这么便宜怎么办?廉价iPhone售价再曝光', 5, 4102)
SELECT CHECKSUM (N'PlayStation Now – Sony startet Spiele-Streaming im Sommer 2014', 238, 13096)
Its easy to get collisions from CHECKSUM(). HASHBYTES() was added in SQL 2005 to enhance SQL Server's system hash functionality so I suggest you also look into this as an alternative.
You can get checksum value through this query:
SELECT
checksum(definition) as Checksum_Value,
definition
FROM sys.sql_modules
WHERE object_id = object_id('RefCRMCustomer_GetCustomerAdditionModificationDetail');
replace your proc name in the bracket.
I have an as400 table containing roughly 1 million rows of full names / company names which I would like to convert to use another datastore while still matching the speed of the original.
Currently, a user enters the search and almost instantaneously gets the alphabetical position of the search term in the table and and a page of matches. The user can then paginate either up or down through the records very quickly.
There is almost no updating of the data and approximately 50 inserts per week. I'm thinking that any database can maintain an alphabetical index of the names, but I'm unsure of how to quickly find the position of the search within the dataset. Any suggestions are greatly appreciated.
This sounds just like a regular pagination of results, except that instead of going to a specific page based on a page number or offset being requested, it goes to a specific page based on where the user's search fits in the results alphabetically.
Let's say you want to fetch 10 rows after this position, and 10 rows before.
If the user searches for 'Smith', you could do two selects such that:
SELECT
name
FROM
companies
WHERE
name < 'Smith'
ORDER BY
name DESC
LIMIT 10
and then
SELECT
name
FROM
companies
WHERE
name >= 'Smith'
ORDER BY
name
LIMIT 10
You could do a UNION to fetch that in one query, the above is just simplified.
The term the user searched for would fit half way through these results. If there are any exact matches, then the first exact match will be positioned such that it is eleventh.
Note that if the user searches for 'aaaaaaaa' then they'll probably just get the 10 first results with nothing before it, and for 'zzzzzzzz' they may get just the 10 last results.
I'm assuming that the SQL engine in question allows >= and < comparisons between strings (and can optimise that in indexes), but I haven't tested this, maybe you can't do this. If, like MySQL, it supports internationalized collations then you could even have the ordering done correctly for non-ascii characters.
If by "the position of the search" you mean the number of the record if they were enumerated alphabetically, you may want to try something like:
select count(*) from companies where name < 'Smith'
Most databases ought to optimize that reasonably well (but try it--theories you read on the web don't trump empirical data).
Just to add to the ordering suggestions:
Add an index to the name if this is your standard means of data retrieval.
You can paginate efficiently by combining LIMIT and OFFSET.
I have a bunch of records in several tables in a database that have a "process number" field, that's basically a number, but I have to store it as a string both because of some legacy data that has stuff like "89a" as a number and some numbering system that requires that process numbers be represented as number/year.
The problem arises when I try to order the processes by number. I get stuff like:
1
10
11
12
And the other problem is when I need to add a new process. The new process' number should be the biggest existing number incremented by one, and for that I would need a way to order the existing records by number.
Any suggestions?
Maybe this will help.
Essentially:
SELECT process_order FROM your_table ORDER BY process_order + 0 ASC
Can you store the numbers as zero padded values? That is, 01, 10, 11, 12?
I would suggest to create a new numeric field used only for ordering and update it from a trigger.
Can you split the data into two fields?
Store the 'process number' as an int and the 'process subtype' as a string.
That way:
you can easily get the MAX processNumber - and increment it when you need to generate a
new number
you can ORDER BY processNumber ASC,
processSubtype ASC - to get the
correct order, even if multiple records have the same base number with different years/letters appended
when you need the 'full' number you
can just concatenate the two fields
Would that do what you need?
Given that your process numbers don't seem to follow any fixed patterns (from your question and comments), can you construct/maintain a process number table that has two fields:
create table process_ordering ( processNumber varchar(N), processOrder int )
Then select all the process numbers from your tables and insert into the process number table. Set the ordering however you want based on the (varying) process number formats. Join on this table, order by processOrder and select all fields from the other table. Index this table on processNumber to make the join fast.
select my_processes.*
from my_processes
inner join process_ordering on my_process.processNumber = process_ordering.processNumber
order by process_ordering.processOrder
It seems to me that you have two tasks here.
• Convert the strings to numbers by legacy format/strip off the junk• Order the numbers
If you have a practical way of introducing string-parsing regular expressions into your process (and your issue has enough volume to be worth the effort), then I'd
• Create a reference table such as
CREATE TABLE tblLegacyFormatRegularExpressionMaster(
LegacyFormatId int,
LegacyFormatName varchar(50),
RegularExpression varchar(max)
)
• Then, with a way of invoking the regular expressions, such as the CLR integration in SQL Server 2005 and above (the .NET Common Language Runtime integration to allow calls to compiled .NET methods from within SQL Server as ordinary (Microsoft extended) T-SQL, then you should be able to solve your problem.
• See
http://www.codeproject.com/KB/string/SqlRegEx.aspx
I apologize if this is way too much overhead for your problem at hand.
Suggestion:
• Make your column a fixed width text (i.e. CHAR rather than VARCHAR).
• Pad the existing values with enough leading zeros to fill each column and a trailing space(s) where the values do not end in 'a' (or whatever).
• Add a CHECK constraint (or equivalent) to ensure new values conform to the pattern e.g. something like
CHECK (process_number LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][ab ]')
• In your insert/update stored procedures (or equivalent), pad any incoming values to fit the pattern.
• Remove the leading/trailing zeros/spaces as appropriate when displaying the values to humans.
Another advantage of this approach is that the incoming values '1', '01', '001', etc would all be considered to be the same value and could be covered by a simple unique constraint in the DBMS.
BTW I like the idea of splitting the trailing 'a' (or whatever) into a separate column, however I got the impression the data element in question is an identifier and therefore would not be appropriate to split it.
You need to cast your field as you're selecting. I'm basing this syntax on MySQL - but the idea's the same:
select * from table order by cast(field AS UNSIGNED);
Of course UNSIGNED could be SIGNED if required.