How to compare two strings in sql in natural order

How to compare two strings in sql in natural order - sql-server

I had a quick question.
I need to compare two strings in SQL in natural order. So if I have a string like ‘20091210’ and ‘20101213’ then the latter would be greater. The string could also contain alpha characters so ‘Y4550’ would be greater than ‘Y4500’. I tried using the CHECKSUM system function to convert the string to a hashed number but that isn’t giving me a number with regard to natural order.
Do you know of anything that I can use aside from making a CLR function?

If I'm understanding your question right, you want to compare two string columns in the same row, or a column with a variable. For that, you can simply use the < and > operators:
SELECT * FROM Users WHERE Username > 'Tom'
That will return any users whose username falls alphabetically after "Tom."
If we were talking about multiple records, ORDER BY will do the trick:
SELECT * FROM Users ORDER BY Username
That will sort users by their username, in ascending alphabetical order.

Related

Azure Search: How does Orderby treat null integer values?

I have an Azure Search Index that I'm hitting from an AngularJS app and I've run into a bit of a problem. One of my tables has a property that defines a "Sort Order" for the rows in the table, which is going to be used in the app to determine what order to display the items in the table to the user. Sort Order is an integer, and the goal here is when an admin user defines the "Sort Order" as 1, that item will appear first, 2 will appear second, etc.
My preferred solution would be to just sort the results from my query by ascending and spit that out to the user, but I've run into a problem with my null values, since Sort Order isn't a required field. All my models are written in C#, so null entries are being set to 0, and thus the entries without a sort order defined would be displayed first if ordered by ascending, which is the exact opposite of what I want.
There's a lot of different ways I could deal with this, but the easiest would be if I could just set the SortOrder property as a nullable int. The only problem with this is I'm not sure how Azure Search treats null values with its OrderBy statement. My initial thought was, if sorted by ascending, it would list all the numerical values in ascending order first, then kick the null values down to the end. If that's the case, then that's perfect, but I'm just looking to find out if that is indeed the case.
So, in short, my question is: Would Azure Search treat null values as greater than or less than a defined integer value? Put another way, if I OrderBy ascending integers, would the null values appear at the top or the bottom?

Azure search will place those documents whose $orderby column happens to be null towards the end of the results. This is true regardless of if you choose to order by "descending" or "ascending".
If there are multiple documents where the $orderby column is null, rest assured that all of them would be placed near the end, but you should ideally treat the ordering amongst the "null-valued" documents as undefined.

I find the above answer wrong, as per my experience, If there are null values in the field, null values appear first if the sort is asc and last if the sort is desc.
You can refer the Azure documentation below for same search-query-odata-orderby

How to force the sorting of values in certain logic

In SQL Server, how do I force the sorting of values to appear in certain logic. As far as I understand, SQL puts Characters first, then numbers and then letters when sorting the values.
Now, I need the underscore to comes after the letter. For example,
I have a value of OA_G and a range between MRI and OL5
Currently SQL puts OA_G between the range. But I need to force it to be outside the range.

Your premise seems incorrect. Regardless of how SQL Server sorts string data, which, by the way, is controlled by the specification of the database's collation, if one were to sort this manually, the results you seek could not be achieved. To wit:
Given the three strings, "MRI", "OL5", and "OA_G", sort these in ascending order by string value.
Of course, this would be done character-by-character, comparing each character from left to right.
Since "M" comes before "O", the first member of the sorted set would be "MRI". Next, compare "OL5" and "OA_G". The letter "O" is the same, so check the next position. "L" is greater than "A" because it is the 11th character in the alphabet, and "A" is the first, so the next member would be "OA_G", leaving "OL5" as the final member.
There is no SQL Server collation order that would make this evaluation give you the results you're seeking.

If the parts of data are like MRI, OL5, OA_G etc, are constants or rarely change, you can make some table like
Part | Priority
MRI | 1
OL5 | 2
OA_G | 3
Join it with your table and ORDER BY Priority

Generate unique ID from string

I am trying to take a text string and create a unique numerical value from it and I am not having any luck.
For example, I have user names (first and last) and birthdate. I have tried taking these values and converting them to varbinary, which does give me a numerical value from the data, but it isn't unique. Out of ~700 records, I will get at least 100 numerical values that are duplicated but the text of first name, last name, and birthdate that was used to generate the number is different.
Here is some code I have been trying:
SELECT CONVERT(VARCHAR(300), CONVERT(BIGINT,(CONVERT(VARBINARY, SE.FirstName) + CONVERT(VARBINARY, SE.BirthDate) ))) FROM ELIGIBILITY SE
If I use that code and convert the following data, the result is 3530884780910457344. So the same number is generated from this unique data:
David 12/03/1952
Janice 12/23/1952
Michael 03/24/1952
Mark 12/23/1952
I am looking for some way, the simpler the better, to take these values and generate a unique numerical value from that data. And the reason why I need to use these values as input is because I am trying to avoid creating duplicates in the future as well as be able to predict the numerical value based on the formula. This is why NewID() won't work for me.

How about simply:
SELECT CHECKSUM(name, BirthDate) FROM dbo.ELIGIBILITY;
Of course, since there are still chances for collisions, maybe you should better define what you are actually trying to do. You've stated some reasons why e.g. NEWID() won't work but I still don't follow the the underlying purpose of this unique number.

Postgres array comparison confusion

When I run
select array[19,21,500] <= array[23,5,0];
I get true.
but when I run
select array[24,21,500] <= array[23,5,0];
I get false. This suggests that the comparison is only on the first element.
I am wondering if there is an operator or possibly function that compares all the entries such that if all the entries in the left array are less than those in the right array (at the same index) it would return true, otherwise return false.
I'm hoping to retrieve all the rows that have an entire array "less than" or "greater than" a given array. I don't know if this is possible.

Arrays use ordinality as a basic property. In other words '{1,3,2}' <> '{1,2,3}' and this is important to understand when looking at comparisons. These look at successive elements.
Imagine for a moment that PostgreSQl didnt have an inet type. We could use int[] to specify cidr blocks. For example, we could see this as '{10,0,0,1,8}' to represent 10.0.0.1/8. We could then compare IP addresses on this way. We could also represent as a bigint as: '{167772161,8}' In this sort of comparison, if you have two IP addresses with different subnets, we can compare them and the one with the more specific subnet would come after the one with the less specific subnet.
One of the basic principles of database normalization is that each field should hold one and only one value for its domain. One reason arrays don't necessarily violate this principle is that, since they have ordinality (and thus act as a tuple rather than a set or a bag), you can use them to represent singular values. The comparisons make perfect sense in that case.
In the case where you want to create an operator which does not respect ordinality, youc an create your own. Basically you make a function that returns a bool based on the two, and then wrap this in an operator (see CREATE OPERATOR in the docs for more on how to do this). You are by no means limited by what PostgreSQL offers out of the box.

To actually conduct the operation you asked for, use unnest() in parallel and aggregate with bool_and():
SELECT bool_and(a < b) -- each element < corresponding element in 2nd array
,bool_and(a <= b)
,bool_and(a >= b)
,bool_and(a > b)
-- etc.
FROM (SELECT unnest('{1,2,3}'::int[]) AS a, unnest('{2,3,4}'::int[]) AS b) t
Both arrays need to have the same number of base elements to be unnested in parallel. Else you get a CROSS JOIN, i.e. a completely different result.

The C programming language 2. ed. question

I'm reading the well known book "The C programming Language, 2nd edition" and there is one exercise that I'm stuck with. I can't figure out what exactly needs to be done, so I would like for someone to explain it to me.
It's Exercise 5-17:
Add a field-searching capability, so sorting may be done on fields within lines, each field sorted according to an independent set of options.
What does the input program expect from the command line; what does it mean by "independent set of options"?

Study the POSIX sort utility, ignoring the legacy options. Or study the GNU sort program; it has even more options than POSIX sort does.
You need to decide between fixed-width fields as suggested by Neil Butterworth in his answer and variable-width fields. You need to decide on what character separates variable-width fields. You need to decide on which sorting modes to support for each field (string, case-folded string, phone-book string, integer, floating point, date, etc) as well as sort direction (forward/reverse or ascending/descending).
The 'independent options' means that you can have different sort criteria for different fields. That is, you can arrange for field 1 to be sorted in ascending string order, field 3 to be sorted in descending integer order, and field 9 to be sorted in ascending date order.
Note that when sorting, the primary criterion is the first key field specified. When two rows are compared, if there is a difference between the first key field in the two rows, then the subsequent key fields are never considered. When two rows are the same in the first key field, then the criterion for the second key field determines the relative order; then, if the second key fields are the same, the third key field is consulted, and so on. If there are no more key fields specified, then the usual default sort criterion is "the whole line of input in ascending string order". A stable sort preserves the relative order of two rows in the original data that are the same when compared using the key field criteria (instead of using the default, whole-line comparison).

It's referring to the ability to specify subfields in each row to sort by. For example:
sort -f1:4a -f20:28d somefile.txt
would sort the field beginning at character position 1 and extending to position4 ascending and within that sort the field beginning at position 20 and extending to 28 descending.
Of course, there are lots of other ways to specify fields, sort order etc. Designing the command line switches is one of the points of the exercise, IMHO.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight