Oracle pro*c indicator variables vs NVL in query execution - c

oracle-pro-c has recommended using indicator variables as "NULL flags" attached to host variables. As per documentation, we can associate every host variable with an optional indicator variable (short type). For example:
short indicator_var;
EXEC SQL SELECT xyz INTO :host_var:indicator_var
FROM ...;
We can also alternatively use NVL as documented in https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions105.htm, for example, as:
EXEC SQL SELECT NVL(TO_CHAR(xyz), '') INTO :host_var
FROM ...;
Which one is better in terms of performance?

Ah, Pro*C. It's a while back, over 20 years, but I think my memory serves me well here.
Using the indicator variables will be better in terms of performance, for two reasons:
The SQL is simpler, so there is less parsing and fewer bytes will be transferred over the network to the database server.
In Oracle in general, a "Null" value is encoded in 0 bytes. An empty string contains a length (N bytes) and storage (0 bytes). So a NULL value is encoded more efficiently in the returned result set.
Now in practice, you won't notice the difference much. But you asked :-)

In my experience NVL was much slower than indicator variables especially if nested (yes you can nest them) for INSERT or UPDATE of fields. it was a long time ago and I don't remember exactly the circumstances but I remember the performance gain was real.
On select it was not that obvious but using indicator variables allows also to detect cases when truncation happen.
If you use VARCHAR or UVARCHAR columns there is a third options to detect NULL/Empty strings in Oracle. The len field will be set to 0 and it means the value is empty. As Oracle does not distinguish between NULL and length 0 strings it is more or less the same.

Related

Should All Parameters In A Prepared Statement Always Use Placeholders?

I have been attempting to find something that tells me one way or the other on how to write a prepared statement when a parameter is static.
Should all parameters always use placeholders, even when the value is always the same?
SELECT *
FROM student
WHERE admission_status = 'Pending' AND
gpa BETWEEN ? AND ?
ie, in this example admission_status will never be anything but 'Pending', but gpa will change depending on either user input or different method calls.
I know this isn't the best example, but the reason I ask is that I have found a noticeable difference in execution speed when replacing all static parameters that where using a placeholder with their static value counterpart in queries that are hundreds of lines long.
Is it acceptable to do this? Or does this go against the standards of prepared statement use? I would like to know one way or the other before I begin to "optimize" larger queries by testing new indexes and replacing the ?s with values to see if there is a boost in execution speed.

Difference between NULL and NULL("") in DML

What is the difference between writing a NULL value to the DML and the NULL with default value. Does writing NULL value is a bad practice? When to decide which method to be adapted.
NULL will store one byte flag along with data where as Null('') doesn't store one byte flag in addition any blank value in field will be treated as Null ie it has "" as default and treat this equal to null. So its good practise to use null with default values.
Sarveshmishra is correct. To answer the other part of your question - whether it is good practice or not depends on how you're using the record format.
NULL as byte flags is primarily an unambiguous (ie no risk of collision with real data) representation of NULL for use within Ab Initio which comes at the cost of extra record width.
Good for: When strictly correct null-handling is required within and between graphs (and when the record-width and translation (reformat) cost is acceptable)
Bad for: File-based data interchange with source and destination systems. (Note that database output via bulk loader (eg SQL Server bcp) is file based.)
NULL(default) encodes NULL as a specific character sequence (or zero-width string). The encoding chosen must match the interface contract with upstream/downstream, otherwise there is a risk of data collision (example: in a database, "" <> NULL, but NULL("") treats "" as NULL).
Good for: Data interchange upstream/downstream, or where the distinction eg between "" and NULL is not important.
Bad for: Guaranteed strict null-handling.
For data-interchange with strict null-handling and zero risk of collision, an alternative strategy is to define explicit null flags in the interface contract.

What's the fastest way to compare to an empty string in T-SQL?

I am working in a SQL Server environment, heavy on stored procedures, where a lot of the procedures use 0 and '' instead of Null to indicate that no meaningful parameter value was passed.
These parameters appear frequently in the WHERE clauses. The usual pattern is something like
WHERE ISNULL(SomeField,'') =
CASE #SomeParameter
WHEN '' THEN ISNULL(SomeField,'')
ELSE #SomeParameter
END
For various reasons, it's a lot easier to make a change to a proc than a change to the code that calls it. So, given that the calling code will be passing empty strings for null parameters, what's the fastest way to compare to an empty string?
Some ways I've thought of:
#SomeParameter = ''
NULLIF(#SomeParameter,'') IS NULL
LEN(#SomeParameter) = 0
I've also considered inspecting the parameter early on in the proc and setting it to NULL if it's equal to '', and just doing a #SomeParameter IS NULL test in the actual WHERE clause.
What other ways are there? And what's fastest?
Many thanks.
Sorting out the parameter at the start of the proc must be faster than multiple conditions in a where clause or using a function in one. The more complex the query, or the more records that have to be filtered, the greater the gain.
The bit that would scare me is if this lack of nullability in the procedure arguments has got into the data as well. If it has when you start locking things down, your queries are going to come back with the "wrong" results.
If this product has some longevity, then I'd say easy is the wrong solution long term, and it should be corrected in the calling applications. If it doesn't then may be you should just leave it alone as all you would be doing is sweeping the mess from under one rug, under another...
How are you going to test these changes, the chances of you introducing a wee error, while bored out of your skull making the same change again and again and again, are very high.

Is it better to maintain 3 small columns or 1 large column in a Table?

Three small number columns [Number(1)] >>
OptionA | 0/1
OptionB | 0/1
OptionC | 0/1
or one larger string column [Varchar2(29)] >>
Options | OptionA=0/1|OptionB=0/1|OptionC=0/1
I'm not sure about the way database handles tables, but I think that maintaining three columns as Number(1) is better than one column as Varchar2(29) !
-EDIT-
Let me explain the situation a bit more:
I am working on a common framework where the all incoming/outgoing request/response is tracked, these interactions can be channeled to a DB/File/JMS; now the all the configuration is being loaded from a table which has a column that corresponds to the output type, currently I'm using "DB=1|FILE=1|JMS=0" as the value of that column so that later if anyone wants to add this for their module they can easily understand what is going on, in my code I've written a simple logic which splits the string by "|" and then I use the exclusive or operator to switch between choice using a switch case..
Everything is already done but I don't like the idea of one large column is better than three small + it will remove the split string I'm doing.
-EDIT-
I finally got it clarified, there may be a situation where we have to add more options; in that case if we add the data column wise, it will result in modifying the table + changing the entity + adding more if's n all; on the other hand I ended up making an enum out of it in a simple bit wise logic to switch between options; this way, I need to modify the enum & add a new handler for the new option & then we are good to go.
Using a single column to store multiple pieces of data is probably the worst thing you can do in a database.
Violating first normal form has at least the following disadvantages:
More difficult to query. OptionA = 1 and OptionB = 1 and OptionC = 0 versus substr(options, 9, 1) = '1' and substr(options, 19, 1) = '1' and substr(options, 19, 1) = '0'.
Less flexable. What happens when you need to add another option? Adding a new column is easy. Adding a new format could mess up old queries. For example, if someone tries to read OptionC with substr(options, -1, 1). (Although this is a good reason to use a 3rd option - a separate table.)
No type safety. This can be a very subtle and tricky problem. Let's say you write substr(options, 9, 1) = 1 instead of substr(options, 9, 1) = '1'. If anyone ever gets the format wrong, a single value could ruin lots of queries. Or worse, it only intermittently crashes a small number of queries, because the access paths keep changing. (Although you can prevent this with a check constraint.)
Slower queries. Normally the amount of work done in an expression or condition isn't a significant cost for a query. But adding a lot of unnecessary string manipulation can make a difference.
Less optimizing. Oracle can only build efficient query plans if it can understand your data. For example, let's say that OptionA is "0" 99.9% of the time. When you filter OptionA = 0, Oracle can use a histogram make a very accurate prediction about the number of rows returned. But for substr(options, 9, 1) = '1' you'll only get a wild guess. If you have complicated queries using this columns you may spend a lot of time trying to "fix" the cardinality estimates. (Although maybe expression statistics could help with this?)
There are times when denormalizing is a good idea. For example, if you have terabytes of data, and compress the table, the single column may take up less space. (But if you're trying to save space, why not use a format like "000" instead?).
If there really is a good reason for this, it definitely needs to be documented. Perhaps add a comment on the column.
For a start, if I am reading your question right, you want each of the options to have one of just two possible values, correct?
If so then you could:
have a separate integer (or boolean) column for each option
have an options column that is a string of 1's and 0's, one digit for each options e.g. "001"
use an 'options' column that is an integer and use a bit value for each options, e.g. optionA == options & 1, optionB == options & 2 etc.
some databases have a bit vector data type which you could use. For mysql there is the BIT data type, which can store bit strings up to 64 bits long.
There will be a trade-off between code complexity and efficiency for each of these. Ask yourself, how much of the machine's time or storage will be saved by employing each of these options? And how much of your time will be saved?
In this instance the 3 column approach is the one I would recommend, not only does this keep things simple in terms of extracting data, but should you ever wish you could set values against all 3 columns rather than being limited to one VarChar2 field. If you opt for the single column VarChar2 then it is fairly simple to extract the info you need using the substr command or perhaps another variation, and although this isn't heavy work for an Oracle db, it does essentially put extra work on the server which is not necessary.

How can I find odd house numbers in MS SQL?

The problem is I need to ignore the stray Letters in the numbers: e.g. 19A or B417
Take a look here:
Extracting Numbers with SQL Server
There are several hidden "gotcha's" that are explained pretty well in the article.
It depends on how much data you're dealing with, but doing that in SQL is probably going to be slow. Not everyone will agree with me here, but I think all data processing should be done in application code.
I would just take the rows you want, and filter it in the application you're dealing with.
The easiest thing to do here would be to create a CLR function which takes the address. In the CLR function, you would take the first part of the address (assuming it is the house number), which should be delimited by whitespace.
Then, replace any non-numeric characters with an empty string.
You should have a string representing an integer at that point which you can pass to the Parse method on the Int32 class to produce an integer, which you can then check to see if it is odd.
I recommend a CLR function (assuming you are using SQL Server 2005 and above, and can set the compatibility level of the database) here because it's easier to perform string manipulations in .NET than it is in T-SQL.
Assuming [Address] is the column with the address in it...
Select Case Cast(Substring(Reverse(Address), PatIndex('%[0-9]%',
Reverse(Address)), 1) as Integer) % 2
When 0 Then 'Even'
When 1 Then 'Odd' End
From Table
I've been through this drill before. The best alternative is to add a column to the table or to a subsidiary joinable table that stores the inferred numerical value for the purpose. Then use iterative queries to set the column repeatedly until you get sufficient accuracy and coverage. You'll end up encountering stuff like "First, Third," "451a", "1200 South 19th Blvd East", and worse.
Then filter new and edited records as they occur.
As usual, UDF's should be avoided as being slow and (comparatively) less debuggable.

Resources