How to compare a part of a string using like function SQL - sql-server

I'm not sure if like function can be used to compare strings or if there is another function to achieve this but this is my case, I have the below part:
R71-14-40000-ATN-LH-D-PF, for the third segment (40000) which is the length; the first 3 digits are the integer part and last 2 digits are the decimals.
I would like to get all parts from DB where the length (third segment) is equals or greater than that value for example, if I use the above part I should get the yellow values an omit the other ones (the values can also be R71-14-50000-ATN-LH-D-PF, R71-14-55000-ATN-LH-D-PF, R71-14-60000-ATN-LH-D-PF, not only start with 4 etc).
I tried this PartNum like '%R71-14-%-ATN-LH-D-PF%' but I get all parts no matter its third segment value

You can use a substring, I think:
where substring(col, 8, 5) >= substring('R71-14-40000-ATN-LH-D-PF', 8, 5)
Some databases use substr() rather than substring().

Using a more restriction LIKE value such as
PartNum LIKE 'R71-14-4____-ATN-LH-D-PF'
would answer the particular query for "values with the 3rd-segment starting with a 4". It could also be ..14-4%-ATN.., although I chose the _ match-exactly-one wildcard for explicitness of a fixed 3rd-segment length (5); it's also easier for the engine to match against.
Then expanding to for "equals or greater than 4" under this fixed-width data can be done by choosing the 3rd-segment starting with a 4, or 5, or 6..
PartNum LIKE 'R71-14-[456789]____-ATN-LH-D-PF'
This works in SQL Server, although there might be slight variations in different RDMBS implementations. This approach is lexical based, which works fine on single-character integer values even though it does not use/utilize numeric equality. SQL Server also supports character-negations that can be useful - see the documentation for the specific RDBMS.
The leading and trailing % are not needed per the shown data. Using a leading % can also be very detrimental to index usage.
The trailing % makes more sense if not caring about the remaining segments,
PartNum LIKE 'R71-14-[456789]____-%'
And if needing to only care about the 3rd-segment,
PartNum LIKE '___-__-[456789]____-%'
PartNum LIKE '___-__-[456789]%' -- or even this
Note the difference from the original query (..14-%-ATN..), which matches all values as expected. This is because it does not add any restrictions to the 3rd-segment value.

Related

Insert characters in SSIS

I have a requirement where I need to insert special characters for a particular column after the interval 2,3,3
example
ABCDEFGHIJKL -> AB-CDE-FGH-IJKL
I know I need to use Derived column in SSIS 2012 but I am stuck with the expression. I would really appreciate if anyone can help me out with the correct expression
The logic will be that you need to split your string at a given ordinal position into two pieces and then concatenate those pieces back together with your special character.
A derived column is going to be a bit ugly because the language doesn't have the power of the .net libraries. We'll make heavy use and abuse of SUBSTRING to get the job done
SUBSTRING(MyCol, 1, 2) + "-" + SUBSTRING(MyCol, 2, LEN(MyCol) -2)
That applies the first special character logic. To simplify matters, I would add this column as MyColStep1 to the data flow. I then add a second Derived Column task that uses the above logic but instead uses MyColStep1 as the input. Taking this approach will make it much, much easier to debug (since you can attach a data viewer on the output path of each component).
SUBSTRING(MyColStep1, 1, 6) + "-" + SUBSTRING(MyColStep1, 6, LEN(MyColStep1) -6)
Etc.

Binary data different when viewed with CFDUMP

I have a SQL Server database that has a table that contains a field of type varbinary(256).
When I view this binary field via a query in MMS, the value looks like this:
0x004BC878B0CB9A4F86D0F52C9DEB689401000000D4D68D98C8975425264979CFB92D146582C38D74597B495F87FEA09B68A8440A
When I view this same field (and same record) using CFDUMP, the value looks like this:
075-56120-80-53-10279-122-48-1144-99-21104-1081000-44-42-115-104-56-10584373873121-49-714520101-126-61-115116891237395-121-2-96-101104-886810
(For the example below, the original binary value will be #A, and the CFDUMP value above will be #B)
I have tried using CAST(#B as varbinary(256)) but didn't get the same value as #A.
What must I do to convert the value retrieved from CFDUMP into the correct binary representation?
Note: I no longer have the applicable records in the database. I need to convert #B into the correct value that can re-INSERT into a varbinary(256) field.
(Expanded from comments)
I do not mean this sarcastically, but what difference does it make how they display binary? It is simply a difference in how the data is presented. It does not mean the actual binary values differ.
It is similar to how dates are handled. Internally, they are a big numbers. But since most people do not know which date 1234567890 represents, applications chose to display the number in a more human friendly format. So SSMS might present the date as 2009-02-13 23:31:30.000, while CF might present it as {ts '2009-02-13 23:31:30'}. Even though the presentations differ, it still the same value internally.
As far as binary goes, SSMS displays it as hexadecimal. If you use binaryEncode() on your query column, and convert the binary to hex, you can see it is the same value. Just without the leading 0x:
writeDump( binaryEncode(yourQuery.binaryColumn, "hex") )
If you are having some other issue with binary, could you please elaborate?
Update:
Unfortunately, I do not think you can easily convert the cfdump representation back into binary. Unlike Railo's implementation, Adobe's cfdump just concatenates the numeric representation of the individual bytes into one big string, with no delimiter. (The dashes are simply negative numbers). You can reproduce this by looping through the bytes of your sample string. The code below produces the same string of numbers you posted.
bytes = binaryDecode("004BC878B0CB9A4F...", "hex");
for (i=1; i<=arrayLen(bytes); i++) {
WriteOutput( bytes[i] );
}
I suppose it is theoretically possible to convert that string into binary, but it would be very difficult. AFAIK, there is no way to accurately determine where one number (or byte) begins and the other ends. There are some clues, but ultimately it would come down to guesswork.
Railo's implementation, displays the byte values separated by a dash "-". Two consecutive dashes indicates a negative number. ie "0", "75", "-56", ...
0-75--56-120--80--53--102-79--122--48--11-44--99--21-104--108-1-0-0-0--44--42--115--104--56--105-84-37-38-73-121--49--71-45-20-101--126--61--115-116-89-123-73-95--121--2--96--101-104--88-68-10
So you could probably parse that string back into an array of bytes. Then insert the binary into your database using <cfqueryparam cfsqltype="CF_SQL_BINARY" ..>. Unfortunately that does not help you, but the explanation might help the next guy.
At this point, I think your best bet is to just restore the data from a database backup.

Is it better to maintain 3 small columns or 1 large column in a Table?

Three small number columns [Number(1)] >>
OptionA | 0/1
OptionB | 0/1
OptionC | 0/1
or one larger string column [Varchar2(29)] >>
Options | OptionA=0/1|OptionB=0/1|OptionC=0/1
I'm not sure about the way database handles tables, but I think that maintaining three columns as Number(1) is better than one column as Varchar2(29) !
-EDIT-
Let me explain the situation a bit more:
I am working on a common framework where the all incoming/outgoing request/response is tracked, these interactions can be channeled to a DB/File/JMS; now the all the configuration is being loaded from a table which has a column that corresponds to the output type, currently I'm using "DB=1|FILE=1|JMS=0" as the value of that column so that later if anyone wants to add this for their module they can easily understand what is going on, in my code I've written a simple logic which splits the string by "|" and then I use the exclusive or operator to switch between choice using a switch case..
Everything is already done but I don't like the idea of one large column is better than three small + it will remove the split string I'm doing.
-EDIT-
I finally got it clarified, there may be a situation where we have to add more options; in that case if we add the data column wise, it will result in modifying the table + changing the entity + adding more if's n all; on the other hand I ended up making an enum out of it in a simple bit wise logic to switch between options; this way, I need to modify the enum & add a new handler for the new option & then we are good to go.
Using a single column to store multiple pieces of data is probably the worst thing you can do in a database.
Violating first normal form has at least the following disadvantages:
More difficult to query. OptionA = 1 and OptionB = 1 and OptionC = 0 versus substr(options, 9, 1) = '1' and substr(options, 19, 1) = '1' and substr(options, 19, 1) = '0'.
Less flexable. What happens when you need to add another option? Adding a new column is easy. Adding a new format could mess up old queries. For example, if someone tries to read OptionC with substr(options, -1, 1). (Although this is a good reason to use a 3rd option - a separate table.)
No type safety. This can be a very subtle and tricky problem. Let's say you write substr(options, 9, 1) = 1 instead of substr(options, 9, 1) = '1'. If anyone ever gets the format wrong, a single value could ruin lots of queries. Or worse, it only intermittently crashes a small number of queries, because the access paths keep changing. (Although you can prevent this with a check constraint.)
Slower queries. Normally the amount of work done in an expression or condition isn't a significant cost for a query. But adding a lot of unnecessary string manipulation can make a difference.
Less optimizing. Oracle can only build efficient query plans if it can understand your data. For example, let's say that OptionA is "0" 99.9% of the time. When you filter OptionA = 0, Oracle can use a histogram make a very accurate prediction about the number of rows returned. But for substr(options, 9, 1) = '1' you'll only get a wild guess. If you have complicated queries using this columns you may spend a lot of time trying to "fix" the cardinality estimates. (Although maybe expression statistics could help with this?)
There are times when denormalizing is a good idea. For example, if you have terabytes of data, and compress the table, the single column may take up less space. (But if you're trying to save space, why not use a format like "000" instead?).
If there really is a good reason for this, it definitely needs to be documented. Perhaps add a comment on the column.
For a start, if I am reading your question right, you want each of the options to have one of just two possible values, correct?
If so then you could:
have a separate integer (or boolean) column for each option
have an options column that is a string of 1's and 0's, one digit for each options e.g. "001"
use an 'options' column that is an integer and use a bit value for each options, e.g. optionA == options & 1, optionB == options & 2 etc.
some databases have a bit vector data type which you could use. For mysql there is the BIT data type, which can store bit strings up to 64 bits long.
There will be a trade-off between code complexity and efficiency for each of these. Ask yourself, how much of the machine's time or storage will be saved by employing each of these options? And how much of your time will be saved?
In this instance the 3 column approach is the one I would recommend, not only does this keep things simple in terms of extracting data, but should you ever wish you could set values against all 3 columns rather than being limited to one VarChar2 field. If you opt for the single column VarChar2 then it is fairly simple to extract the info you need using the substr command or perhaps another variation, and although this isn't heavy work for an Oracle db, it does essentially put extra work on the server which is not necessary.

How can I find odd house numbers in MS SQL?

The problem is I need to ignore the stray Letters in the numbers: e.g. 19A or B417
Take a look here:
Extracting Numbers with SQL Server
There are several hidden "gotcha's" that are explained pretty well in the article.
It depends on how much data you're dealing with, but doing that in SQL is probably going to be slow. Not everyone will agree with me here, but I think all data processing should be done in application code.
I would just take the rows you want, and filter it in the application you're dealing with.
The easiest thing to do here would be to create a CLR function which takes the address. In the CLR function, you would take the first part of the address (assuming it is the house number), which should be delimited by whitespace.
Then, replace any non-numeric characters with an empty string.
You should have a string representing an integer at that point which you can pass to the Parse method on the Int32 class to produce an integer, which you can then check to see if it is odd.
I recommend a CLR function (assuming you are using SQL Server 2005 and above, and can set the compatibility level of the database) here because it's easier to perform string manipulations in .NET than it is in T-SQL.
Assuming [Address] is the column with the address in it...
Select Case Cast(Substring(Reverse(Address), PatIndex('%[0-9]%',
Reverse(Address)), 1) as Integer) % 2
When 0 Then 'Even'
When 1 Then 'Odd' End
From Table
I've been through this drill before. The best alternative is to add a column to the table or to a subsidiary joinable table that stores the inferred numerical value for the purpose. Then use iterative queries to set the column repeatedly until you get sufficient accuracy and coverage. You'll end up encountering stuff like "First, Third," "451a", "1200 South 19th Blvd East", and worse.
Then filter new and edited records as they occur.
As usual, UDF's should be avoided as being slow and (comparatively) less debuggable.

Phone Number Columns in a Database

In the last 3 companies I've worked at, the phone number columns are of type varchar(n). The reason being that they might want to store extensions (ext. 333). But in every case, the "-" characters are stripped out when inserting and updating. I don't understand why the ".ext" characters are okay to store but not the "-" character. Has any one else seen this and what explanation can you think of for doing it this way? If all you want to store is the numbers, then aren't you better off using an int field? Conversely, if you want to store the number as a string/varchar, then why not keep all the characters and not bother with formatting on display and cleaning on write?
I'm also interested in hearing about other ways in which phone number storage is implemented in other places.
Quick test: are you going to add/subtract/multiply/divide Phone Numbers? Nope. Similarly to SSNs, Phone Numbers are discrete pieces of data that can contain actual numbers, so a string type is probably most appropriate.
one point with storing phone numbers is a leading 0.
eg: 01202 8765432
in an int column, the 0 will be stripped of, which makes the phone number invalid.
I would hazard a guess at the - being swapped for spaces is because they dont actually mean anything
eg: 123-456-789 = 123 456 789 = 123456789
Personally, I wouldn't strip out any characters, as depending on where the phone number is from, it could mean different things. Leave the phone number in the exact format it was entered, as obviously that's the way the person who typed it in is used to seeing it.
It doesn't really matter how you store it, as long as it's consistent. The norm is to strip out formatting characters, but you can also store country code, area code, exchange, and extension separately if you have a need to query on those values. Again, the requirement is that it's consistent - otherwise querying it is a PITA.
Another reason I can think of not to store phone numbers as 'numbers' but as strings of characters, is that often enough part of the software stack you'd use to access the database (PHP, I am looking at you) wouldn't support big enough integers (natively) to be able to store some of the longer and/or exotic phone numbers.
Largest number that 32-bits can carry, without sign, is 4294967295. That wouldn't work for just any Russian mobile phone number, take, for instance, the number 4959261234.
So you have yourself an extra inconvenience of finding a way to carry more than 32-bits worth of number data. Even though databases have long supported very large integers, you only need one bad link in the chain for a showstopper. Like PHP, again.
Stripping some characters and allowing others may have an impact if the database table is going to drive another system, e.g. IP Telephony of some sort. Depending on the systems involved, it may be legitimate to have etc.333 as a suffix, whereas the developers may not have accounted for "-" in the string (and yes, I am guessing here...)
As for storing as a varchar rather than an int, this is just plain-ole common sense to me. As mentioned before, leading zeros may be stripped in an int field, the query on an int field may perform implicit math functions (which could also explain stripping "-" from the text, you don't want to enter 555-1234 and have it stored as -679 do you?)
In short, I don't know the exact reasoning, but can deduce some possibilities.
I'd opt to store the digits as a string and add the various "()" and "-" in my display code. It does get more difficult with international numbers. We handle it by having various "internationalized" display formats depending on country.
What I like to do if I know the phone numbers are only going to be within a specific region, such as North America, is to change the entry into 4 fields. 3 for area code, 3 for prefix, 3 for line, and maybe 5 for extension. I then insert these as 1 field with '-' and maybe an 'e' to designate extension. Any searching of course also needs to follow the same process. This ensures I get more regular data and even allows for the number to be used for actually making a phone call, once the - and the extension are removed. I can also get back to original 4 fields easily.
Good stuff! It seems that the main point is that the formatting of the phone number is not actually part of the data but is instead an aspect of the source country. Still, by keeping the extension part of the number as is, one might be breaking the model of separating the formatting from the data. I doubt that all countries use the same syntax/format to describe an extension. Additionally, if integrating with a phone system is a (possible) requirement, then it might be better to store the extension separately and build the message as it is expected. But Mark also makes a good point that if you are consistent, then it probably won't matter how you store it since you can query and process it consistently as well.
Thank you Eric for the link to the other question.
When an automated telephone system uses a field to make a phone call it may not be able to tell what characters it should use and which it should ignore in dialing. A human being may see a "(" or ")" or "-" character and know these are considered delimiters separating the area code, npa, and nxx of the phone number. Remember though that each character represents a binary pattern that, unless pre-programmed to ignore, would be entered by an automated dialer. To account for this it is better to store the equivalent of only the characters a user would press on the phone handset and even better that the individual values be stored in separate columns so the dialer can use individual fields without having to parse the string.
Even if not using dialing automation it is a good practice to store things you dont need to update in the future. It is much easier to add characters between fields than strip them out of strings.
In comment of using a string vs. integer datatype as noted above strings are the proper way to store phone numbers based on variations between countries. There is an important caveat to that though in that while aggregating statistics for reporting (i.e. SUM of how many numbers or calls) character strings are MUCH slower to count than integers. To account for this its important to add an integer as an identity column that you can use for counting instead of the varchar or char field datatype.

Resources