TSQL Make All Words That Are Missing Vowels Upper Case - sql-server

I am having a difficult time producing a script that makes all sub strings within a string upper case if they contain no vowels.
For example:
'Hammer Products Llc' Should be: 'Hammer Product LLC'.
Or:
'49 Ways Ltd' Should be: '49 Ways LTD'.
As a application developer I am still working on the concept of TSQL Set based processing and avoiding iterations whenever possible. So aside from the task of identifying those rows that have words within them that are missing vowels... for the life of me I can not think of a Set based way to identify and then further update those sub strings other than iterating through them.
So far I am working on the first part in trying to identify those rows that have sub strings that are missing vowels. My only thought is to take each string and Split() it with a customized function... then taking each of the split words and testing them to see if it is all consonants. Then if all consonants perform an update on that word.
My major concern is that this approach will be very heavy to process. This is a real brain twister and any help in the right direction would be greatly appreciated!

You can readily find which strings have a word with no vowels with something like:
where ' ' + lower(str) + ' ' not like '% %[aeiou]% %'
Do note that this doesn't take punctuation into account. That gets a bit harder, because SQL Server doesn't support regular expressions.
Modifying a part of a string to be upper case is much, much harder. Your idea of using split() is definitely one approach. Another is to write a user-defined function.
My recommendation, though, is to do this work in another tool. If you are learning SQL, try using it for tasks it is more suited for.

Related

TEXTSPLIT combined with BYROW returns an unexpected result when using an array of strings as input

I am testing the following simple case:
=LET(input, {"a,b;c,d;" ; "e,d;f,g;"},
BYROW(input, LAMBDA(item, TEXTJOIN(";",,TEXTSPLIT(item,",",";", TRUE)))))
since the TEXTJOIN is the inverse operation of TEXTSPLIT, the output should be the same as input without the last ;, but it doesn't work like that.
If I try using a range instead it works:
It works for a single string:
=LET(input, "a,b;c,d;", TEXTJOIN(";",,TEXTSPLIT(input,",",";", TRUE)))
it returns: a,b;c,d
What I am doing wrong here? I think it might be a bug. Per TEXTSPLIT documentation there is no constraint of using TEXTSPLIT combined with BYROW when using an array of strings.
Not sure if this would classify as an answer but thought I'd share my attempt at it.
I don't think the problem here is TEXTSPLIT(). I tried different things. 1st I tried to incorporate FILTERXML() to do the split, with the exact same result. For good measure:
=BYROW({"a,b;c,d;","e,d;f,g;"},LAMBDA(item,TEXTJOIN(";",,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(item,",",";"),";","</s><s>")&"</s></t>","//s"))))
Then I tried to enforce array usage with T(IF(1,TEXTSPLIT("a,b;c,d;",{",",";"},,1))) but Excel would not budge.
The above lead me to believe the problem is in fact BYROW() itself. Even though documentation says the 1st parameter takes an array, the working with other array-functions do seem to be buggy and you could report it as such.
For what it's worth for now; you could use REDUCE() as mentioned in the comments and in the linked answer however I'd preserve that for more intricate stacking of uneven distributed columns/rows. In your case MAP() will work and is simpler than BYROW():
=LET(input, {"a,b;c,d;";"e,d;f,g;"},
MAP(input, LAMBDA(item, TEXTJOIN(";",,TEXTSPLIT(item,",",";", TRUE)))))
And to be honest, this is kind of what MAP() is designed for anyway.

Declaring a 10 item array as Dim items(10) As String. Any disadvantages? [duplicate]

In declaring an array in VB, would you ever leave the zero element empty and adjust the code to make it more user friendly?
This is for Visual Basic 2008
No, I wouldn't do that. It seems like it might help maintainability, but that's a very short-sighted view.
Think about it this way. It only takes each programmer who has to understand and maintain the code a short amount of time to get comfortable with zero-indexed arrays. But if you're using one-based arrays, which are unlike those found in almost all other VB.NET code, and in fact almost every other common programming language, it will take everyone on the team much longer. They'll be constantly making mistakes, tripping up because their natural assumptions aren't accurate in this one special case.
I know how it feels. When I worked in VB 6, I loved one-based arrays. They were very natural for the type of data that I was storing, and I used them all over the place. Perfectly documentable here, because you have an explicit syntax to specify the upper and lower bounds of the array. That's not the case in VB.NET (which is a newer, but incompatible version of the Visual Basic language), where all arrays have to be zero-indexed. I had a hard time switching to VB.NET's zero-based arrays for the first couple of days. After that initial period of adjustment, I can honestly say I've never looked back.
Some might argue that leaving the first element of every array empty would consume extra memory needlessly. While that's obviously true, I think it's a secondary reason behind the one I presented above. Good developers write code for others to read, so I commend you for considering how to make your code logical and understandable. You're on the right path by asking this question. But in the long run, I don't think this decision is a good one.
There might be a handful of exceptions in very specific cases, depending on the type of data that you're storing in the array. But again, failing to do this across the board seems like it would hurt readability in the aggregate, rather than helping it. It's not particularly counter-intuitive to simply write the following, once you've learned how arrays are indexed:
For i As Integer = 0 To (myArray.Length - 1)
'Do work
Next
And remember that in VB.NET, you can also use the For Each statement to iterate through your array elements, which many people find more readable. For example:
For Each i As Integer In myArray
'Do work
Next
First, it is about programmer friendly, not user friendly. User will never know the code is 0-based or 1-based.
Second, 0-based is the default and will be used more and more.
Third, 0-based is more natural to computer. From the very element, it has two status, 0 and 1, not 1 and 2.
I have upgraded a couple of VB6 projects to vb.net. To modify to 0-based array in the beginning is better than to debug the code a later time.
Most of my VB.Net arrays are 0-based and every element is used. That's usual in VB.Net and code mustn't surprise the reader. Readability is vital.
Any exceptions? Maybe if I had a program ported from VB6, so it used 0-based arrays with unused initial elements, and it needed a small change, I might match the pattern of the existing code. Least surprise again.
99 times out of 100 the question shouldn't arise because you should be using List(Of T) rather than an array!
Who are the "users" that are going to see the array indexes? Any good developer will be able to handle a zero-indexed array and no real user should ever see them. If the user has to interact with the array, then make an actually user-friendly system for doing so (text or a 1-based virtual index or whatever is called for).
In visual basic is it possible to declare an array starting from 1, if you find inconvenient to use a 0 based array.
Dim array(1 to 10) as Integer
It is just a matter of tastes. I use 1 based arrays in visual basic but 0 based arrays in C ;)

can my code improve from using LINQ?

I have this code, which works fine, but is slow on large datasets.
I'd like to hear from the experts if this code could benefit from using Linq, or another method, and if so, how?
Dim array_of_strings As String()
' now I add strings to my array, these come from external file(s).
' This does not take long
' Throughout the execution of my program, I need to validate millions
' of other strings.
Dim search_string As String
Dim indx As Integer
' So we get million of situation like this, where I need to find out
' where in the array I can find a duplicate of this exact string
search_string = "the_string_search_for"
indx = array_of_strings.ToList().IndexOf(search_string)
Each of the strings in my array are unique, no duplicates.
This works pretty well, but like I said, too slow for larger datasets. I am running this query millions of times. Currently it takes about 1 minute for a million queries but this is too slow to my liking.
There's no need to use Linq.
If you used an indexed data structure like a dictionary, the search would be O(log n), at the cost of a slightly longer process of filling the structure. But you do that once, then do a million searches, you're going to come out ahead.
See the description of Dictionary at this site:
https://msdn.microsoft.com/en-us/library/7y3x785f(v=vs.110).aspx
Since (I think) you're talking about a collection that is its own key, you could save some memory by using SortedSet<T>
https://msdn.microsoft.com/en-us/library/dd412070(v=vs.110).aspx
No, I don't think it can benefit from linq.
Linq queries are slow, relatively speaking.
You might try to multithread it, however.

How should I deal with escapes when splitting a delimited nvarchar(max) in SQL

I am in the process of rolling over a bunch of old stored procedures that take NVARCHAR(MAX) strings of comma and/or semicolon separated values (never mind about one value per variable etc.). The code is currently using the CHARINDEX approach described in this question, though in principle any of the approaches would work (I'm tempted to replace it with the XML one, because neatness).
The question, though, is what is the most efficient may of handing escaped delimiters? Obviously the lowest level approach is a character by character parser, but I can't shake the feeling that (1) that's going to be horrible when executed a million times in close succession and (2) it'll be overcomplicated for the situation.
Basically, I want to handle 3 possible escapes:
"\\", "\,", and "\;" somewhere in my string. What's the best way to do it? I should add that, ideally, I don't want to make any assumptions about what characters are included in the string.
Sample data would look something like the below.
Value1,Value\,2,ValueWithSlashAtTheEnd\\,ValueWithSlashAndCommaAtTheEnd\\\,
I'm actually splitting to rows rather than columns, but the principle is the same; I'd expect the below output typically:
SomeName
^^^^^^^^
Value1
Value,2
ValueWithSlashAtTheEnd\
ValueWithSlashAndCommaAtTheEnd\,
Needless to say, the escapes could occur anywhere in a value, and ideally I'd like to handle for semicolons as well, but I'll probably be able to infer that from the comma behaviour.
Just provide your function edited string:
replace(replace(#yourstring, '\\', '^'), '\,', '#')
Then replace back:
replace(replace(#returnedstring, '#', ','), '^', '\')
Replace ^ and # with any characters that are not on the string.

Linking filenames or labels to numeric index

In a C99+SDL game, I have an array that contains sound effects (SDL_mixer chunk data and some extra flags and filename string) and is referenced by index such as "sounds[2].data".
I'd like to be able to call sounds by filename, but I don't want to strcmp all the array until a match is found. This way as I add more sounds, or change the order, or allow for player-defined sound mods, they can still be called with a common identifier (such as "SHOT01" or "EXPL04").
What would be the fastest approach for this? I heard about hashing, which would result in something similar to lua's string indexes (such as table["field"]) but I don't know anything about the topic, and seems fairly complicated.
Just in case it matters, I plan to have filenames or labels be from 6 to 8 all caps filenames (such as "SHOT01.wav").
So to summarize, where can I learn about hashing short strings like that, or what would be the fastest way to keep track of something like sound effects so they can be called using arbitrary labels or identifiers?
I think in your case you can probably just keep all the sounds in a sorted data structure and use a fast search algorithm to find matches. Something like a binary search is very simple implement and it gives good performance.
However, if you are interested in hash tables and hashing, the basics of it all are pretty simple. There is no place like Wikipedia to get the basics down and you can then tailor your searches better on Google to find more in depth articles.
The basics are you start out with a fixed size array and store everything in there. To figure out where to store something you take the key (in your case the sound name) and you perform some operation on it such that it gives you an exact location where the value can be found. So the simplest case for string hashing is just adding up all the letters in the string as integer values then take the value and use modulus to give you an index in your array.
position = SUM(string letters) % [array size]
Of course naturally multiple strings will have same sum and thus give you the same position. This is called a collision, and collisions can be handled in many ways. The simplest way is to have an array of lists rather than array of values, and simply append to the list every there there is a collision. When searching for a value, simply iterate the lists and find the value you need.
Ideally a good hashing algorithm will have few collisions and quick hashing algorithm thus providing huge performance boost.
I hope this helps :)
You are right, when it comes to mapping objects with a set of string keys, hash tables are often the way to go.
I think this article on wikipedia is a good starting point to understand hash table mechanism: http://en.wikipedia.org/wiki/Hash_table

Resources