I have been attempting to find something that tells me one way or the other on how to write a prepared statement when a parameter is static.
Should all parameters always use placeholders, even when the value is always the same?
SELECT *
FROM student
WHERE admission_status = 'Pending' AND
gpa BETWEEN ? AND ?
ie, in this example admission_status will never be anything but 'Pending', but gpa will change depending on either user input or different method calls.
I know this isn't the best example, but the reason I ask is that I have found a noticeable difference in execution speed when replacing all static parameters that where using a placeholder with their static value counterpart in queries that are hundreds of lines long.
Is it acceptable to do this? Or does this go against the standards of prepared statement use? I would like to know one way or the other before I begin to "optimize" larger queries by testing new indexes and replacing the ?s with values to see if there is a boost in execution speed.
Related
oracle-pro-c has recommended using indicator variables as "NULL flags" attached to host variables. As per documentation, we can associate every host variable with an optional indicator variable (short type). For example:
short indicator_var;
EXEC SQL SELECT xyz INTO :host_var:indicator_var
FROM ...;
We can also alternatively use NVL as documented in https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions105.htm, for example, as:
EXEC SQL SELECT NVL(TO_CHAR(xyz), '') INTO :host_var
FROM ...;
Which one is better in terms of performance?
Ah, Pro*C. It's a while back, over 20 years, but I think my memory serves me well here.
Using the indicator variables will be better in terms of performance, for two reasons:
The SQL is simpler, so there is less parsing and fewer bytes will be transferred over the network to the database server.
In Oracle in general, a "Null" value is encoded in 0 bytes. An empty string contains a length (N bytes) and storage (0 bytes). So a NULL value is encoded more efficiently in the returned result set.
Now in practice, you won't notice the difference much. But you asked :-)
In my experience NVL was much slower than indicator variables especially if nested (yes you can nest them) for INSERT or UPDATE of fields. it was a long time ago and I don't remember exactly the circumstances but I remember the performance gain was real.
On select it was not that obvious but using indicator variables allows also to detect cases when truncation happen.
If you use VARCHAR or UVARCHAR columns there is a third options to detect NULL/Empty strings in Oracle. The len field will be set to 0 and it means the value is empty. As Oracle does not distinguish between NULL and length 0 strings it is more or less the same.
I may be erring towards pedantry here, but say I have a field in a database that currently has two values (but may contain more in future). I know I could name this as a flag (e.g. MY_FLAG) containing values 0 and 1, but should more values be required (e.g. 0,1,2,3,4), is it still correct to call the field a flag?
I seem to recall reading something previously, that a flag should always be binary, and anything else should be labelled more appropriately, but I may be mistaken. Does anyone know if my thinking is correct? If so, can you point me to any information on this please? My googling has turned nothing up!!
Thanks very much :o)
Flags are usually binary because when we say flag it means either it is up(1) or down(0).
Just like it is used in military to flag up and down in order to show the war-signs. The concept of flagging is taken from there.
Regarding what you are saying
"your words : values be required (e.g. 0,1,2,3,4)"
In such a situation use Enum. Enumerations are build for such cases or sometimes what we do is , we justify the meaning of these numeric values in comments or in separate file so that more memory could be saved(we use tinyInt or bit field). But never name such a situation Flag.
Flags have standard meaning that is either Up or Down. It doesn't mean that you will get error or something but it is not a good practice. Hope you get it.
It's all a matter of conventions and the ability to maintain your database/code effectively. Technically, you can have a column called my_flag defined as a varchar and hold values like "batman" and "barak obama".
By convention, flags are boolean. If you intend to have other values there, it's probably a better idea to call the column something else, like some_enum, or my_code.
Very occasionally, people talk about (for example) tri-state flags, but Wikipedia and most of the dictionary definitions that I read reserve "flag" for binary / two state uses1.
Of course, neither Wikipedia or any dictionary has the authority to say some English usage is "incorrect". "Correct" usage is really "conventional" usage; i.e. what other people say / write.
I would argue that saying or writing "tri-state flag" is unconventional, but it is unambiguous and serves its purpose of communicating a concept adequately. (And the usage can be justified ...)
1 - Most, but not all; see http://www.oxforddictionaries.com/definition/english/flag.
Don't call anything "flag". Or "count" or "mark" or "int" or "code". Name it like everything else in code: after what it means.
workday {mon..fri}
tall {yes,no}
zip_code {00000..99999}
state {AL..WY}
Notice that (something like) yes/no plays the 'flag' role of indicating a permanent dichotomy. (In lieu of boolean, which does that in the rest of the universe outside SQL). For when the specification/contract really is whether something is so. If a design might add more values you should use a different type.
Of course if you want to add more info to a name you can. Add distinctions that are meaningful if you can.
workday {monday..friday}
workday_abbrev {mon..fri}
is_tall {yes,no}
zip_plus_5 {00000-99..99999-99}
state_name {Alabama..Wyoming}
state_2 {AL..WY}
I am working in a SQL Server environment, heavy on stored procedures, where a lot of the procedures use 0 and '' instead of Null to indicate that no meaningful parameter value was passed.
These parameters appear frequently in the WHERE clauses. The usual pattern is something like
WHERE ISNULL(SomeField,'') =
CASE #SomeParameter
WHEN '' THEN ISNULL(SomeField,'')
ELSE #SomeParameter
END
For various reasons, it's a lot easier to make a change to a proc than a change to the code that calls it. So, given that the calling code will be passing empty strings for null parameters, what's the fastest way to compare to an empty string?
Some ways I've thought of:
#SomeParameter = ''
NULLIF(#SomeParameter,'') IS NULL
LEN(#SomeParameter) = 0
I've also considered inspecting the parameter early on in the proc and setting it to NULL if it's equal to '', and just doing a #SomeParameter IS NULL test in the actual WHERE clause.
What other ways are there? And what's fastest?
Many thanks.
Sorting out the parameter at the start of the proc must be faster than multiple conditions in a where clause or using a function in one. The more complex the query, or the more records that have to be filtered, the greater the gain.
The bit that would scare me is if this lack of nullability in the procedure arguments has got into the data as well. If it has when you start locking things down, your queries are going to come back with the "wrong" results.
If this product has some longevity, then I'd say easy is the wrong solution long term, and it should be corrected in the calling applications. If it doesn't then may be you should just leave it alone as all you would be doing is sweeping the mess from under one rug, under another...
How are you going to test these changes, the chances of you introducing a wee error, while bored out of your skull making the same change again and again and again, are very high.
I came across some existing code in our production environment given to us by our vendor. They use a string to store comma seperated values to store filtered results from a DB. Keep in mind that this is for a proprietary scripting language called PowerOn that interfaces with a database residing on an AIX system, but it's a language that supports strings, integers, and arrays.
For example, we have;
Account
----------------
123
234
3456
28390
The psuedo code might look like;
Define accounts As String
For Each Account
accounts=accounts + CharCast(Account) + ","
End
as opposed to something I would expect to see like
Define accounts As Integer Array(99)
Define index as Integer=0
For Each Account
accounts(index)=Account
index=index+1
End
By the time the loop is done, accounts will look like; 123,234,3456,28390,. The string is later used to test if a specific instance exists like so
If CharSearch("28390", accounts) > 0 Then Call DoSomething
In the example, the statement evaluates to true and DoSomething gets called. Given the option of arrays, why would want to store integer values whithin a string of comma seperated values? Every language I've come across, it's almost always more expensive to perform string based operations than integer based operations.
Considering I haven't seen this technique before and my experience is somewhat limitted, is there a name for this? Is this common practice or is this just another example of being too stringly typed? To extend the existing code, should I continue using string method? Did we get cruddy code from our vendor?
What I put in the comment still holds but my real answer is: It's probably a design decision with respect to compatibility/portability. In your integer-array case (and a low enough level of the API) you'd typically find yourself asking questions like, what's a safe guess of the size of an integer on "today"'s machines. What about endianness.
The most portable and most flexible of all data formats always has been and always will be printed representation. It may not be as fast to process that but that's where adapters/converters or so kick in. I wouldn't be surprised to find (human-readable) printed representation of something especially in database APIs like you describe.
If you want something fast, just take whatever is given to you, convert it to a more efficient internal format, do you processing and convert it back.
There's nothing inherently wrong with using comma-separated strings instead of arrays. Sure you can't readily access a random n's element of such a collection, but if such random access is not needed then there's no penalty for it, right?
As far as I know Oracle DB stores NUMBER values as strings (and if my memory is correct - for DATEs as well) for very practical reasons.
In your specific example looks like using strings is an overkill when dealing with passing data around without crossing the process boundaries. But could it be that the choice of string data type makes more sense when sending data over wire or storing on disk?
The problem is I need to ignore the stray Letters in the numbers: e.g. 19A or B417
Take a look here:
Extracting Numbers with SQL Server
There are several hidden "gotcha's" that are explained pretty well in the article.
It depends on how much data you're dealing with, but doing that in SQL is probably going to be slow. Not everyone will agree with me here, but I think all data processing should be done in application code.
I would just take the rows you want, and filter it in the application you're dealing with.
The easiest thing to do here would be to create a CLR function which takes the address. In the CLR function, you would take the first part of the address (assuming it is the house number), which should be delimited by whitespace.
Then, replace any non-numeric characters with an empty string.
You should have a string representing an integer at that point which you can pass to the Parse method on the Int32 class to produce an integer, which you can then check to see if it is odd.
I recommend a CLR function (assuming you are using SQL Server 2005 and above, and can set the compatibility level of the database) here because it's easier to perform string manipulations in .NET than it is in T-SQL.
Assuming [Address] is the column with the address in it...
Select Case Cast(Substring(Reverse(Address), PatIndex('%[0-9]%',
Reverse(Address)), 1) as Integer) % 2
When 0 Then 'Even'
When 1 Then 'Odd' End
From Table
I've been through this drill before. The best alternative is to add a column to the table or to a subsidiary joinable table that stores the inferred numerical value for the purpose. Then use iterative queries to set the column repeatedly until you get sufficient accuracy and coverage. You'll end up encountering stuff like "First, Third," "451a", "1200 South 19th Blvd East", and worse.
Then filter new and edited records as they occur.
As usual, UDF's should be avoided as being slow and (comparatively) less debuggable.