Any harm in using BEGIN / END to organize SQL code? - sql-server

I am not quite used to writing very long SPs. And I found it become quickly hard to manage when working with those wide tables. Even a simple UPSERT will take very long line or tens of lines of code (I found the col/line style easier to read for me, but working with those very wide tables are really a pain for me). Gradualy I found myself start to use BEGIN/END to wrap those UPSERTs and others basic operations on the same table etc to group them into logical sections. So that in IDE they can be easily fold and unfold, to check and review with comments. Note: I am not wrting them for any control blocks. Just simple BEGIN/END to wrap them into a foldable section.
Will adding a few extra BEGIN/END to group and fold/unfold lenthy code lead to any side effect? Currently I am not seeing any yet...
If so what they could they be ? Are there any better way to orgnize those code in SP ?
Oh by folding I mean
CREATE PROCEDURE/MULT_STATEMENT_FUNCTION
...
BEGIN
-- UPSERT QC records
BEGIN
UPDATE QC_SECT1 <-- fold[]
SET COL1=...
COL2=
.....
IF ##ROWCOUNT.... <--fold[ click 1]
INSERT INTO ... <-Fold[click 2]
(
COL1
....
VALUES <-Fold[click 3]
(
...
END
-- calculate distribution
BEGIN
DECLARE #GRP1_.....
DECLARE #GRP2_....
WITH .....
....
) AS PRE_CAL1 <-- fold[click 1]
) AS PRE_CAL2 <-- fold[click 2]
) AS ... <-- fold[click 3]
END
END
I found it very inconvienint to inspect long scripts, Gradually I find myself start to use BEGIN/END. In VSCode it take so many clicks to fold a single CTE statement. BEGIN/END made it a lot easier. But I really want to make sure, I am doing sth wrong.
PS: The grouping columns by line ideas are very valid, I fully agree.
Regarding the col/line style. It's not that I am really fond of it. But I guess under my current situation many people probably will prefer it too. The wide tables in vendor db have long prefix to , I'd say ,group the columns. list them vertically helps to align them, so that they can be distingished easier... I guess mine currently is a special case.
Thanks.

You can do this. It's a good idea to structure code like that because it's integrated with the language and the IDE in a way that comments are not. In C languages I sometimes use {} blocks for that in case it is necessary to have larger methods.
I found the col/line style easier to read for me
If that's easier for you then it makes sense. But maybe it also makes sense to train your eyes to accept a style where many columns are in one line. This saves a tremendous amount of vertical space. More fits on the same screen and clarity increases.

No, absolutely not. There is no harm in wrapping blocks of code in BEGIN/END blocks. I have been doing it for 10+ years without consequence. The optimizer basically ignores it unless there is a reason to evaluate the BEGIN/END logic (e.g. for a loop or other "WHILE" condition.)
Using BEGIN/END code blocks in SSMS makes it possible to quickly collapse/expand large blocks of code. Below is some code I was just working on to collect some routine metadata. It's 110 lines of code yet easily readable.

Related

Raku : is there a SUPER fast way to turn an array into a string without the spaces separating the elements?

I need to convert thousands of binary byte strings, each about a megabyte long, into ASC strings. This is what I have been doing, and seems too slow:
sub fileToCorrectUTF8Str ($fileName) { # binary file
my $finalString = "";
my $fileBuf = slurp($fileName, :bin);
for #$fileBuf { $finalString = $finalString ~ $_.chr; };
return $finalString;
}
~#b turns #b into string with all elements separated by space, but this is not what I want. If #b = < a b c d >; the ~#b is "a b c d"; but I just want "abcd", and I want to do this REALLY fast.
So, what is the best way? I can't really use hyper for parallelism because the final string is constructed sequentially. Or can I?
TL;DR On an old rakudo, .decode is about 100X times as fast.
In longer form to match your code:
sub fileToCorrectUTF8Str ($fileName) { # binary file
slurp($fileName, :bin).decode
}
Performance notes
First, here's what I wrote for testing:
# Create million and 1 bytes long file:
spurt 'foo', "1234\n6789\n" x 1e5 ~ 'Z', :bin;
# (`say` the last character to check work is done)
say .decode.substr(1e6) with slurp 'foo', :bin;
# fileToCorrectUTF8Str 'foo' );
say now - INIT now;
On TIO.run's 2018.12 rakudo, the above .decode weighs in at about .05 seconds per million byte file instead of about 5 seconds for your solution.
You could/should of course test on your system and/or using later versions of rakudo. I would expect the difference to remain in the same order, but for the absolute times to improve markedly as the years roll by.[1]
Why is it 100X as fast?
Well, first, # on a Buf / Blob explicitly forces raku to view the erstwhile single item (a buffer) as a plural thing (a list of elements aka multiple items). That means high level iteration which, for a million element buffer, is immediately a million high level iterations/operations instead of just one high level operation.
Second, using .decode not only avoids iteration but only incurs relatively slow method call overhead once per file whereas when iterating there are potentially a million .chr calls per file. Method calls are (at least semantically) late-bound which is in principle relatively costly compared to, for example, calling a sub instead of a method (subs are generally early bound).
That all said:
Remember Caveat Empty[1]. For example, rakudo's standard classes generate method caches, and it's plausible the compiler just in-lines the method anyway, so it's possible there is negligible overhead for the method call aspect.
See also the doc's Performance page, especially Use existing high performance code.
Is the Buf.Str error message LTA?
Update See Liz++'s comment.
If you try to use .Str on a Buf or Blob (or equivalent, such as using the ~ prefix on it) you'll get an exception. Currently the message is:
Cannot use a Buf as a string, but you called the Str method on it
The doc for .Str on a Buf/Blob currently says:
In order to convert to a Str you need to use .decode.
It's arguably LTA that the error message doesn't suggest the same thing.
Then again, before deciding what to do about this, if anything, we need to consider what, and how, folk could learn from anything that goes wrong, including signals about it, such as error messages, and also what and how they do in fact currently learn, and bias our reactions toward building the right culture and infrastructure.
In particular, if folk can easily connect between an error message they see, and online discussion that elaborates on it, that needs to be taken into account and perhaps encouraged and/or made easier.
For example, there's now this SO covering this issue with the error message in it, so a google is likely to get someone here. Leaning on that might well be a more appropriate path forward than changing the error message. Or it might not. The change would be easy...
Please consider commenting below and/or searching existing rakudo issues to see if improvement of the Buf.Str error message is being considered and/or whether you wish to open an issue to propose it be altered. Every rock moved is at least great exercise, and, as our collective effort becomes increasingly wise, improves (our view of) the mountain.
Footnotes
[1] As the well known Latin saying Caveat Empty goes, both absolute and relative performance of any particular raku feature, and more generally any particular code, is always subject to variation due to factors including one's system's capabilities, its load during the time it's running the code, and any optimization done by the compiler. Thus, for example, if your system is "empty", then your code may run faster. Or, as another example, if you wait a year or three for the compiler to get faster, advances in rakudo's performance continue to look promising.

how to use pulp to generate variables and constraints of sparse matrix?

there,
I am new to pulp. I learn pulp from some examples I got online. These examples are very helpful and now I am able to write simple models by mtself. But I still feel difficult to build complex model, especailly model with sparse matrix.
Could you please kindly post with some complex examples with sparse matrix, and conplex constraints. I want to learn how to create necessary variables only, instead of simple one, such as, y = LpVariable.dicts("y", (Factorys, Customers) ,0,1,LpBinary).
I have another question: What happen if I simply use y = LpVariable.dicts("y", (Factorys, Customers) ,0,1,LpBinary) to define variables, in which most of variables are useless in model objective function and constraints, and I add some constraints to explicitly set such useless variable to 0? Does pulp algorithm is able to firstly identify such uesless variables and remove them first, then run Integer Programming algorithm (such as B&B or B&C) to solve the problem with reduced size? If this is true, It looks the "setting useless variable to 0" method will not decrease the solution speed at all. Am I right?
This may help
http://www.stuartmitchell.com/journal/2012/2/3/my-top-n-tips-for-python-coding-in-optimisation-1.html
In particular generate a set of of factories and customers first that is sparse.
factories_customers = [(f,c) for f in factories for c in customers
if <insert your condition here>]
Then use
y = LpVariable.dicts("y", factories_customers ,0,1,LpBinary)
Pulp does not remove "useless" variables and constraints so the model build time will be long.
However, the solution algorithms (CBC by default contain pre-solve algorithms that will remove the variables).

Is this confusing TSQL syntax more efficient? Is it really that confusing?

I would have searched but I don't even know how to search for something like this, so here goes.
I work in a company where I'm looking at lots of older stored procs (currently we're on sql server 2012, I don't know when they were written, but likely 3-5+ years ago). I continually come across these little nuggets in select statements, and they are extremely confusing to me. Ultimately they are all math-based, so maybe there's a performance reason for these things? Maybe they are way more common than I thought, and I just haven't been exposed to them?
Here is an example, imagine it is buried in a SELECT clause, selecting from a table that has a numerical "balance" column (and please forgive me if my parens don't match up exactly ... you will get the idea).
SELECT
...
HasBalance = LEFT('Y', 1 * ABS(SIGN(SUM(balance)))) + LEFT('N', 1 * (1 -ABS(SIGN(SUM(balance)))))
...
I mean, it wasn't that hard to figure out that essentially what it's doing is coming up with 'Y' or 'N' by concat'ing 2 strings ('Y' + '') or ('' + 'N'), each of which were generated by using 3 functions to reduce a value (balance) to a 1 or 0 to multiply by 1 to determine how many characters to take in each LEFT function (ultimately, 0 or 1).
But come on.
Here is another example:
sum(x.amount * charindex('someCode', x.code) * charindex(x.code, 'someCode'))
This one is essentially saying x.code MUST equal 'someCode' or return 0; otherwise return x.amount. If either charindex fails it throws a *0 into the mix, 0'ing out the math.
There are lots of others, they're not impossible to figure out WHAT they're doing ... but I still have to spend time figuring out what they're doing.
Is the obscurity as obscure as I think it is? If so, is it worth the tradeoff (assuming there is a tradeoff)?
Thanks!

How to go about creating a prolog program that can work backwards to determine steps needed to reach a goal

I'm not sure what exactly I'm trying to ask. I want to be able to make some code that can easily take an initial and final state and some rules, and determine paths/choices to get there.
So think, for example, in a game like Starcraft. To build a factory I need to have a barracks and a command center already built. So if I have nothing and I want a factory I might say ->Command Center->Barracks->Factory. Each thing takes time and resources, and that should be noted and considered in the path. If I want my factory at 5 minutes there are less options then if I want it at 10.
Also, the engine should be able to calculate available resources and utilize them effectively. Those three buildings might cost 600 total minerals but the engine should plan the Command Center when it would have 200 (or w/e it costs).
This would ultimately have requirements similar to 10 marines # 5 minutes, infantry weapons upgrade at 6:30, 30 marines at 10 minutes, Factory # 11, etc...
So, how do I go about doing something like this? My first thought was to use some procedural language and make all the decisions from the ground up. I could simulate the system and branching and making different choices. Ultimately, some choices are going quickly make it impossible to reach goals later (If I build 20 Supply Depots I'm prob not going to make that factory on time.)
So then I thought weren't functional languages designed for this? I tried to write some prolog but I've been having trouble with stuff like time and distance calculations. And I'm not sure the best way to return the "plan".
I was thinking I could write:
depends_on(factory, barracks)
depends_on(barracks, command_center)
builds_from(marine, barracks)
build_time(command_center, 60)
build_time(barracks, 45)
build_time(factory, 30)
minerals(command_center, 400)
...
build(X) :-
depends_on(X, Y),
build_time(X, T),
minerals(X, M),
...
Here's where I get confused. I'm not sure how to construct this function and a query to get anything even close to what I want. I would have to somehow account for rate at which minerals are gathered during the time spent building and other possible paths with extra gold. If I only want 1 marine in 10 minutes I would want the engine to generate lots of plans because there are lots of ways to end with 1 marine at 10 minutes (maybe cut it off after so many, not sure how you do that in prolog).
I'm looking for advice on how to continue down this path or advice about other options. I haven't been able to find anything more useful than towers of hanoi and ancestry examples for AI so even some good articles explaining how to use prolog to DO REAL THINGS would be amazing. And if I somehow can get these rules set up in a useful way how to I get the "plans" prolog came up with (ways to solve the query) other than writing to stdout like all the towers of hanoi examples do? Or is that the preferred way?
My other question is, my main code is in ruby (and potentially other languages) and the options to communicate with prolog are calling my prolog program from within ruby, accessing a virtual file system from within prolog, or some kind of database structure (unlikely). I'm using SWI-Prolog atm, would I be better off doing this procedurally in Ruby or would constructing this in a functional language like prolog or haskall be worth the extra effort integrating?
I'm sorry if this is unclear, I appreciate any attempt to help, and I'll re-word things that are unclear.
Your question is typical and very common for users of procedural languages who first try Prolog. It is very easy to solve: You need to think in terms of relations between successive states of your world. A state of your world consists for example of the time elapsed, the minerals available, the things you already built etc. Such a state can be easily represented with a Prolog term, and could look for example like time_minerals_buildings(10, 10000, [barracks,factory])). Given such a state, you need to describe what the state's possible successor states look like. For example:
state_successor(State0, State) :-
State0 = time_minerals_buildings(Time0, Minerals0, Buildings0),
Time is Time0 + 1,
can_build_new_building(Buildings0, Building),
building_minerals(Building, MB),
Minerals is Minerals0 - MB,
Minerals >= 0,
State = time_minerals_buildings(Time, Minerals, Building).
I am using the explicit naming convention (State0 -> State) to make clear that we are talking about successive states. You can of course also pull the unifications into the clause head. The example code is purely hypothetical and could look rather different in your final application. In this case, I am describing that the new state's elapsed time is the old state's time + 1, that the new amount of minerals decreases by the amount required to build Building, and that I have a predicate can_build_new_building(Bs, B), which is true when a new building B can be built assuming that the buildings given in Bs are already built. I assume it is a non-deterministic predicate in general, and will yield all possible answers (= new buildings that can be built) on backtracking, and I leave it as an exercise for you to define such a predicate.
Given such a predicate state_successor/2, which relates a state of the world to its direct possible successors, you can easily define a path of states that lead to a desired final state. In its simplest form, it will look similar to the following DCG that describes a list of successive states:
states(State0) -->
( { final_state(State0) } -> []
; [State0],
{ state_successor(State0, State1) },
states(State1)
).
You can then use for example iterative deepening to search for solutions:
?- initial_state(S0), length(Path, _), phrase(states(S0), Path).
Also, you can keep track of states you already considered and avoid re-exploring them etc.
The reason you get confused with the example code you posted is essentially that build/1 does not have enough arguments to describe what you want. You need at least two arguments: One is the current state of the world, and the other is a possible successor to this given state. Given such a relation, everything else you need can be described easily. I hope this answers your question.
Caveat: my Prolog is rusty and shallow, so this may be off base
Perhaps a 'difference engine' approach would be appropriate:
given a goal like 'build factory',
backwards-chaining relations would check for has-barracks and tell you first to build-barracks,
which would check for has-command-center and tell you to build-command-center,
and so on,
accumulating a plan (and costs) along the way
If this is practical, it may be more flexible than a state-based approach... or it may be the same thing wearing a different t-shirt!

What's the fastest way to compare to an empty string in T-SQL?

I am working in a SQL Server environment, heavy on stored procedures, where a lot of the procedures use 0 and '' instead of Null to indicate that no meaningful parameter value was passed.
These parameters appear frequently in the WHERE clauses. The usual pattern is something like
WHERE ISNULL(SomeField,'') =
CASE #SomeParameter
WHEN '' THEN ISNULL(SomeField,'')
ELSE #SomeParameter
END
For various reasons, it's a lot easier to make a change to a proc than a change to the code that calls it. So, given that the calling code will be passing empty strings for null parameters, what's the fastest way to compare to an empty string?
Some ways I've thought of:
#SomeParameter = ''
NULLIF(#SomeParameter,'') IS NULL
LEN(#SomeParameter) = 0
I've also considered inspecting the parameter early on in the proc and setting it to NULL if it's equal to '', and just doing a #SomeParameter IS NULL test in the actual WHERE clause.
What other ways are there? And what's fastest?
Many thanks.
Sorting out the parameter at the start of the proc must be faster than multiple conditions in a where clause or using a function in one. The more complex the query, or the more records that have to be filtered, the greater the gain.
The bit that would scare me is if this lack of nullability in the procedure arguments has got into the data as well. If it has when you start locking things down, your queries are going to come back with the "wrong" results.
If this product has some longevity, then I'd say easy is the wrong solution long term, and it should be corrected in the calling applications. If it doesn't then may be you should just leave it alone as all you would be doing is sweeping the mess from under one rug, under another...
How are you going to test these changes, the chances of you introducing a wee error, while bored out of your skull making the same change again and again and again, are very high.

Resources