Eiffel: how to use old keyword in ensure contract clause with detached - eiffel

How to use the old keyword into ensure clause of a feature, the current statement doesn't seem to be valid at runtime
relationships: CHAIN -- any chain
some_feature
do
(...)
ensure
relationship_added: attached relationships as l_rel implies
attached old relationships as l_old_rel and then
l_rel.count = l_old_rel.count
end
For the whole case, following screenshots demonstrate the case
start of routine execution
end of routine execution

There are 2 cases to consider: when the original reference is void, and when — not:
attached relationships as r implies r.count =
old (if attached relationships as o then o.count + 1 else 1 end)
The intuition behind the assertion is as follows:
If relationships is Void on exit, we do not care about count.
If relationships is attached on exit, it has one more item compared to the old value. The old value may be attached or Void:
if the old value is attached, the new number of items is the old number plus 1;
if the old value is Void, the new number of items is 1.

Related

Neo4j Create new label based on other labels with certain properties

I need to create Trajectories based on Points.
A Trajectory can contain any number of Points that meet certain criteria.
Criteria are: cameraSid, trajectoryId, classType and classQual should be equal.
The difference in time (at) of each point must be less or equal to 1 hour.
In order to create a Trajectory we need at least one point.
In order to associate a new point to an existing Trajectory, the latest associated Point of the trajectory must be no older and 1 hour compared to the new point.
If the new Point has the exact same properties but is older than 1 hour, then a new Trajectory need to be created.
I've been reading a lot but, I cannot make this work as it should.
This is what I have tried so far:
MATCH (p:Point)
WHERE NOT (:Trajectory)-[:CONTAINS]->(p)
WITH p.cameraSid AS cameraSid, p.trajectoryId AS trajectoryId, p.classType AS classType, p.classQual AS classQual, COLLECT(p) AS points
UNWIND points AS point
MERGE (trajectory:Trajectory{trajectoryId:point.trajectoryId, cameraSid: point.cameraSid, classType: point.classType, classQual: point.classQual, date: date(datetime(point.at))})
MERGE (trajectory)-[:CONTAINS{at:point.at}]->(point)
I have no idea how to create this sort of condition (1hr or less) in the MERGE clause.
Here are the neo4j queries to create some data
// Create points
LOAD CSV FROM 'https://uca54485eb4c5d2a6869053af475.dl.dropboxusercontent.com/cd/0/get/AmR2pn0hC0c-CQW_mSS-TDqHQyi7MNVjPvqffQHhSIyMP37D7UMtfODdHDkNWi6-HqzQdp4ob2Q3326g6imEd26F3sdNJyJuAeNa8wJA2o_E6A/file?dl=1#' AS line
CREATE (:Point{trajectoryId: line[0],at: line[1],cameraSid: line[2],activity: line[3],x: line[4],atEpochMilli: line[5],y: line[6],control: line[7],classQual: line[8],classType: line[9],uniqueIdentifier: line[10]})
// Create Trajectory based on Points
MATCH (p:Point)
WHERE NOT (:Trajectory)-[:CONTAINS]->(p)
WITH p.cameraSid AS cameraSid, p.trajectoryId AS trajectoryId, p.classType AS classType, p.classQual AS classQual, COLLECT(p) AS points
UNWIND points AS point
MERGE (trajectory:Trajectory{trajectoryId:point.trajectoryId, cameraSid: point.cameraSid, classType: point.classType, classQual: point.classQual, date: date(datetime(point.at))})
MERGE (trajectory)-[:CONTAINS{at:point.at}]->(point)
If the link to the CSV file does not work, here is an alternative, in this case, you will have to download the file and then import it locally from your neo4j instance.
I think this is one of those just because you can do it in one Cypher statement doesn't mean you should situations, and you will almost certainly find this easier to do in application code.
Regardless, it can be done using APOC and by introducing an instanceId unique property on your Trajectory nodes.
Possible solution
This almost certainly won't scale, and you'll want indexes (discussed later based on educated guesswork).
First we need to change your import script to make sure that the at property is a datetime and not just a string (otherwise we end up peppering the queries with datetime() calls:
LOAD CSV FROM 'file:///export.csv' AS line
CREATE (:Point{trajectoryId: line[0], at: datetime(line[1]), cameraSid: line[2], activity: line[3],x: line[4], atEpochMilli: line[5], y: line[6], control: line[7], classQual: line[8], classType: line[9], uniqueIdentifier: line[10]})
The following then appears to take your sample data set and add Trajectories per your requirements (and can be run whenever new Points are added).
CALL apoc.periodic.iterate(
'
MATCH (p: Point)
WHERE NOT (:Trajectory)-[:CONTAINS]->(p)
RETURN p
ORDER BY p.at
',
'
OPTIONAL MATCH (t: Trajectory { trajectoryId: p.trajectoryId, cameraSid: p.cameraSid, classQual: p.classQual, classType: p.classType })-[:CONTAINS]-(trajPoint:Point)
WITH p, t, max(trajPoint.at) as maxAt, min(trajPoint.at) as minAt
WITH p, max(case when t is not null AND (
(p.at <= datetime(maxAt) + duration({ hours: 1 }))
AND
(p.at >= datetime(minAt) - duration({ hours: 1 }))
)
THEN t.instanceId ELSE NULL END) as instanceId
MERGE (tActual: Trajectory { trajectoryId: p.trajectoryId, cameraSid: p.cameraSid, classQual: p.classQual, classType: p.classType, instanceId: COALESCE(instanceId, randomUUID()) })
ON CREATE SET tActual.date = date(datetime(p.at))
MERGE (tActual)-[:CONTAINS]->(p)
RETURN instanceId
',
{ parallel: false, batchSize: 1 })
Explanation
The problem as posed is tricky because the decision on whether or not to create a new Trajectory or add the point to an existing one depends entirely on how we handled all prior Points. That means two things:
We need to process the Points in order to make sure we create reliable Trajectories - we start with the earliest, and work up
We need each creation or amend of a Trajectory to be immediately visible for the processing of the next Point - that is to say that we need to handle each Point in isolation, as though it were a mini-transaction
We'll use apoc.periodic.iterate with a batchSize of 1 to give us the behaviour we need.
The first parameter builds the set of nodes to be processed - all those Points which aren't currently part of a Trajectory, sorted by their timestamp.
The second parameter to apoc.periodic.iterate is where the magic's happening so let's break that down - given a point p that isn't part of a Trajectory so far:
OPTIONAL MATCH (t: Trajectory { trajectoryId: p.trajectoryId, cameraSid: p.cameraSid, classQual: p.classQual, classType: p.classType })-[:CONTAINS]-(trajPoint:Point)
WITH p, t, max(trajPoint.at) as maxAt, min(trajPoint.at) as minAt
WITH p, max(case when t is not null AND (
(p.at <= datetime(maxAt) + duration({ hours: 1 }))
AND
(p.at >= datetime(minAt) - duration({ hours: 1 }))
)
THEN t.instanceId ELSE NULL END) as instanceId
Finds any Trajectory that matches the key fields and that contain a Point that's within an hour of the incoming point p and pick out its instanceId property if we find a suitable match (or the biggest one we found if there are multiple matches - we just want to ensure there's zero or one rows by this point)
We'll see what instanceId is all about in a minute, but consider it a unique identifier for a given Trajectory
MERGE (tActual: Trajectory { trajectoryId: p.trajectoryId, cameraSid: p.cameraSid, classQual: p.classQual, classType: p.classType, instanceId: COALESCE(instanceId, randomUUID()) })
ON CREATE SET tActual.date = date(datetime(p.at))
Ensure that there's a Trajectory that matches the incoming point's key fields - if the previous code found a matching Trajectory then the MERGE has no work to do. Otherwise create the new Trajectory - we'll add a property instanceId to a new random UUID if we didn't match a Trajectory earlier to compel the MERGE to create a new node (since no other will exist with that UUID, even if one matches all the other key fields)
MERGE (tActual)-[:CONTAINS]->(p)
RETURN instanceId
tActual is now the Trajectory that the incoming point p should belong to - create the :CONTAINS relationship
The third parameter is vital:
{ parallel: false, batchSize: 1 })
Important: For this to work, each iteration of the 'inner' Cypher statement has to happen exactly in order, so we force a batchSize of 1 and disable parallelism to prevent APOC from scheduling the batches in any way other than one-at-a-time.
Indexing
I think the performance of the above is going to degrade quickly as the size of the import grows and the number of Trajectories increases. At a minimum I think you'll want a composite index on
:Trajectory(trajectoryId, cameraSid, classQual, classType) - so that the initial match to find candidate trajectories for a given Point is quick
:Trajectory(trajectoryId, cameraSid, classQual, classType, instanceId) - so that the MERGE at the end finds the existing Trajectory to add to quickly, if one exists
However - that's guesswork from eyeballing the query, and unfortunately you can't see into the query properly to tell what the execution plan is because we're using apoc.periodic.iterate - an EXPLAIN or PROFILE will just tell you that there's one procedure call costing 1 db hit, which is true but not helpful.

Power Query M loop table / lookup via a self-join

First of all I'm new to power query, so I'm taking the first steps. But I need to try to deliver sometime at work so I can gain some breathing time to learn.
I have the following table (example):
Orig_Item Alt_Item
5.7 5.10
79.19 79.60
79.60 79.86
10.10
And I need to create a column that will loop the table and display the final Alt_Item. So the result would be the following:
Orig_Item Alt_Item Final_Item
5.7 5.10 5.10
79.19 79.60 79.86
79.60 79.86 79.86
10.10
Many thanks
Actually, this is far too complicated for a first Power Query experience.
If that's what you've got to do, then so be it, but you should be aware that you are starting with a quite difficult task.
Small detail: I would expect the last Final_Item to be 10.10. According to the example, the Final_Item will be null if Alt_Item is null. If that is not correct, well that would be a nice first step for you to adjust the code below accordingly.
You can create a new blank query, copy and paste this code in the Advanced Editor (replacing the default code) and adjust the Source to your table name.
let
Source = Table.Buffer(Table1),
AddedFinal_Item =
Table.AddColumn(
Source,
"Final_Item",
each if [Alt_Item] = null
then null
else List.Last(
List.Generate(
() => [Final_Item = [Alt_Item], Continue = true],
each [Continue],
each [Final_Item =
Table.First(
Table.SelectRows(
Source,
(x) => x[Orig_Item] = [Final_Item]),
[Alt_Item = "not found"]
)[Alt_Item],
Continue = Final_Item <> "not found"],
each [Final_Item])))
in
AddedFinal_Item
This code uses function List.Generate to perform the looping.
For performance reasons, the table should always be buffered in memory (Table.Buffer), before invoking List.Generate.
List.Generate is one of the most complex Power Query functions.
It requires 4 arguments, each of which is a function in itself.
In this case the first argument starts with () and the other 3 with each (it should be clear from the outline above: they are aligned).
Argument 1 defines the initial values: a record with fields Final_Item and Continue.
Argument 2 is the condition to continue: if an item is found.
Argument 3 is the actual transformation in each iteration: the Source table is searched (with Table.SelectRows) for an Orig_Item equal to Alt_Item. This is wrapped in Table.First, which returns the first record (if any found) and accepts a default value if nothing found, in this case a record with field Alt_Item with value "not found", From this result the value of record field [Alt_Item] is returned, which is either the value of the first record, or "not found" from the default value.
If the value is "not found", then Continue becomes false and the iterations will stop.
Argument 4 is the value that will be returned: Final_Item.
List.Generate returns a list of all values from each iteration. Only the last value is required, so List.Generate is wrapped in List.Last.
Final remark: actual looping is rarely required in Power Query and I think it should be avoided as much as possible. In this case, however, it is a feasible solution as you don't know in advance how many Alt_Items will be encountered.
An alternative for List.Generate is using a resursive function.
Also List.Accumulate is close to looping, but that has a fixed number of iterations.
This can be solved simply with a self-join, the open question is how many layers of indirection you'll be expected to support.
Assuming just one level of indirection, no duplicates on Orig_Item, the solution is:
let
Source = #"Input Table",
SelfJoin1 = Table.NestedJoin( Source, {"Alt_Item"}, Source, {"Orig_Item"}, "_tmp_" ),
Expand1 = ExpandTableColumn( SelfJoin1, "_tmp_", {"Alt_Item"}, {"_lkp_"} ),
ChkJoin1 = Table.AddColumn( Expand1, "Final_Item", each (if [_lkp_] = null then [Alt_Item] else [_lkp_]), type number)
in
ChkJoin1
This is doable with the regular UI, using Merge Queries, then Expand Column and adding a custom column.
If yo want to support more than one level of indirection, turn it into a function to be called X times. For data-driven levels of indirection, you wrap the calls in a list.generate that drop the intermediate tables in a structured column, though that's a much more advanced level of PQ.

Django: Avoiding ABA scenario in Database and ORM’s

I have a situation where I need to update votes for a candidate.
Citizens can vote for this candidate, with more than one vote per candidate. i.e. one person can vote 5 votes, while another person votes 2. In this case this candidate should get 7 votes.
Now, I use Django. And here how the pseudo code looks like
votes = candidate.votes
vote += citizen.vote
The problem here, as you can see is a race condition where the candidate’s votes can get overwritten by another citizen’s vote who did a select earlier and set now.
How can avoid this with an ORM like Django?
If this is purely an arithmetic expression then Django has a nice API called F expressions
Updating attributes based on existing fields
Sometimes you'll need to perform a simple arithmetic task on a field, such as incrementing or decrementing the current value. The obvious way to achieve this is to do something like:
>>> product = Product.objects.get(name='Venezuelan Beaver Cheese')
>>> product.number_sold += 1
>>> product.save()
If the old number_sold value retrieved from the database was 10, then the value of 11 will be written back to the database.
This can be optimized slightly by expressing the update relative to the original field value, rather than as an explicit assignment of a new value. Django provides F() expressions as a way of performing this kind of relative update. Using F() expressions, the previous example would be expressed as:
>>> from django.db.models import F
>>> product = Product.objects.get(name='Venezuelan Beaver Cheese')
>>> product.number_sold = F('number_sold') + 1
>>> product.save()
This approach doesn't use the initial value from the database. Instead, it makes the database do the update based on whatever value is current at the time that the save() is executed.
Once the object has been saved, you must reload the object in order to access the actual value that was applied to the updated field:
>>> product = Products.objects.get(pk=product.pk)
>>> print product.number_sold
42
Perhaps the select_for_update QuerySet method is helpful for you.
An excerpt from the docs:
All matched entries will be locked until the end of the transaction block, meaning that other transactions will be prevented from changing or acquiring locks on them.
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released. If this is not the behavior you want, call select_for_update(nowait=True). This will make the call non-blocking. If a conflicting lock is already acquired by another transaction, DatabaseError will be raised when the queryset is evaluated.
Mind that this is only available in the Django development release (i.e. > 1.3).

What is an appropriate data structure and database schema to store logic rules?

Preface: I don't have experience with rules engines, building rules, modeling rules, implementing data structures for rules, or whatnot. Therefore, I don't know what I'm doing or if what I attempted below is way off base.
I'm trying to figure out how to store and process the following hypothetical scenario. To simplify my problem, say that I have a type of game where a user purchases an object, where there could be 1000's of possible objects, and the objects must be purchased in a specified sequence and only in certain groups. For example, say I'm the user and I want to purchase object F. Before I can purchase object F, I must have previously purchased object A OR (B AND C). I cannot buy F and A at the same time, nor F and B,C. They must be in the sequence the rule specifies. A first, then F later. Or, B,C first, then F later. I'm not concerned right now with the span of time between purchases, or any other characteristics of the user, just that they are the correct sequence for now.
What is the best way to store this information for potentially thousands of objects that allows me to read in the rules for the object being purchased, and then check it against the user's previous purchase history?
I've attempted this, but I'm stuck at trying to implement the groupings such as A OR (B AND C). I would like to store the rules in a database where I have these tables:
Objects
(ID(int),Description(char))
ObjectPurchRules
(ObjectID(int),ReqirementObjectID(int),OperatorRule(char),Sequence(int))
But obviously as you process through the results, without the grouping, you get the wrong answer. I would like to avoid excessive string parsing if possible :). One object could have an unknown number of previous required purchases. SQL or psuedocode snippets for processing the rules would be appreciated. :)
It seems like your problem breaks down to testing whether a particular condition has been satisfied.
You will have compound conditions.
So given a table of items:
ID_Item Description
----------------------
1 A
2 B
3 C
4 F
and given a table of possible actions:
ID_Action VerbID ItemID ConditionID
----------------------------------------
1 BUY 4 1
We construct a table of conditions:
ID_Condition VerbA ObjectA_ID Boolean VerbB ObjectB_ID
---------------------------------------------------------------------
1 OWNS 1 OR MEETS_CONDITION 2
2 OWNS 2 AND OWNS 3
So OWNS means the id is a key to the Items table, and MEETS_CONDITION means that the id is a key to the Conditions table.
This isn't meant to restrict you. You can add other tables with quests or whatever, and add extra verbs to tell you where to look. Or, just put quests into your Items table when you complete them, and then interpret a completed quest as owning a particular badge. Then you can handle both items and quests with the same code.
This is a very complex problem that I'm not qualified to answer, but I've seen lots of references to. The fundamental problem is that for games, quests and items and "stats" for various objects can have non-relational dependencies. This thread may help you a lot.
You might want to pick up a couple books on the topic, and look into using LUA as a rules processor.
Personally I would do this in code, not in SQL. Each item should be its own class implementing an interface (i.e. IItem). IItem would have a method called OkToPurchase that would determine if it is OK to purchase that item. To do that, it would use one or more of a collection of rules (i.e. HasPreviouslyPurchased(x), CurrentlyOwns(x), etc.) that you can build.
The nice thing is that it is easy to extend this approach with new rules without breaking all the existing logic.
Here's some pseudocode:
bool OkToPurchase()
{
if( HasPreviouslyPurchased('x') && !CurrentlyOwns('y') )
return true;
else
return false;
}
bool HasPreviouslyPurchased( item )
{
return purchases.contains( item )
}
bool CurrentlyOwns( item )
{
return user.Items.contains( item )
}

Implementing check constraints with SQL CLR integration

I'm implementing 'check' constraints that simply call a CLR function for each constrained column.
Each CLR function is one or two lines of code that attempts to construct an instance of the user-defined C# data class associated with that column. For example, a "Score" class has a constructor which throws a meaningful error message when construction fails (i.e. when the score is outside a valid range).
First, what do you think of that approach? For me, it centralizes my data types in C#, making them available throughout my application, while also enforcing the same constraints within the database, so it prevents invalid manual edits in management studio that non-programmers may try to make. It's working well so far, although updating the assembly causes constraints to be disabled, requiring a recheck of all constraints (which is perfectly reasonable). I use DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS to make sure the data in all tables is still valid for enabled and disabled constraints, making corrections as necessary, until there are no errors. Then I re-enable the constraints on all the tables via ALTER TABLE [tablename] WITH CHECK CHECK CONSTRAINT ALL. Is there a T-SQL statement to re-enable with check all check constraints on ALL tables, or do I have to re-enable them table by table?
Finally, for the CLR functions used in the check constraints, I can either:
Include a try/catch in each function to catch data construction errors, returning false on error, and true on success, so that the CLR doesn't raise an error in the database engine, or...
Leave out the try/catch, just construct the instance and return true, allowing that aforementioned 'meaningful' error message to be raised in the database engine.
I prefer 2, because my functions are simpler without the error code, and when someone using management studio makes an invalid column edit, they'll get the meaningful message from the CLR like "Value for type X didn't match regular expression '^p[1-9]\d?$'" instead of some generic SQL error like "constraint violated". Are there any severe negative consequences of allowing CLR errors through to SQL Server, or is it just like any other insert/update failure resulting from a constraint violation?
For example, a "Score" class has a constructor which throws a meaningful error message when construction fails (i.e. when the score is outside a valid range). First, what do you think of that approach?
It worries me a bit, because calling a ctor requires memory allocation, which is relatively expensive. For each row inserted, you're calling a ctor -- and only for its side-effects.
Also expensive are exceptions. They're great when you need them, but this is a case where you vould use them in a ctor context, but not in a check context.
A refactoring could reduce both costs, by having the check exist as a class static or free function, then both the check constraint and the ctor could call that:
class Score {
private:
int score;
public:
static bool valid( int score ) {
return score > 0 ;
}
Score( int s ) {
if( ! valid( s ) ) {
throw InvalidParameter();
}
score = s;
}
}
Check constraint calls Score::valid(), no construction or exception needed.
Of course, you still have the overhead, for each row, of a CLR call. Whether that's acceptable is something you'll have to decide.
Is there a T-SQL statement to re-enable with check all check constraints on ALL tables, or do I have to re-enable them table by table?
No, but you can do this to generate the commands:
select 'ALTER TABLE ' || name || ' WITH CHECK CHECK CONSTRAINT ALL;'
from sys.tables ;
and then run the resultset against the database.
Comments from the OP:
I use base classes called ConstrainedNumber and RegexConstrainedString for all my data types. I could easily move those two classes' simple constructor code to a separate public boolean IsValueValid method as you suggested, and probably will.
The CLR overhead (and memory allocation) would only occur for inserts and updates. Given the simplicity of the methods, and rate at which table updates will occur, I don't think the performance impact will anything to worry about for my system.
I still really want to raise exceptions for the information they'll provide to management studio users. I like the IsValueValid method, because it gives me the 'option' of not throwing errors. Within applications using my data types, I could still get the exception by constructing an instance :)
I'm not sure I agree with the exception throwing, but again, the "take-home message" is that by decomposing the problem into parts, you can select what parts you're wiling to pay for, without paying for parts you don't use. The ctor you don't use, because you were only calling it to get the side-effect. So we decomposed creation and checking. We can further decompose throwing:
class Score {
private:
int score;
public:
static bool IsValid( int score ) {
return score > 0 ;
}
static checkValid( int score ) {
if( ! isValid( s ) ) {
throw InvalidParameter();
}
Score( int s ) {
checkValid( s ) ;
score = s;
}
}
Now a user can call the ctor, and get the check and possible exception and construction, call checkValid and get the check and exception, or isValid to just get the validity, paying the runtime cost for only what he needs.
Some clarification. These data classes set one level above the primitives types, constraining data to make it meaningful.
Actually, they sit just above the RegexConstrainedString and ConstrainedNumber<T> classes, which is where we're talking about refactoring the constructor's validation code into a separate method.
The problem with refactoring the validation code, is that the Regex necessary for validation exists only in the subclasses of RegexConstrainedString, since each subclass has a different Regex. This means that the validation data is only available to the RegexConstrainedString's constructor, not any of it's methods. So, if I factor out the validation code, callers would need access to the Regex.
public class Password: RegexConstrainedString
{
internal static readonly Regex regex = CreateRegex_CS_SL_EC( #"^[\w!""#\$%&'\(\)\*\+,-\./:;<=>\?#\[\\\]\^_`{}~]{3,20}$" );
public Password( string value ): base( value.TrimEnd(), regex ) {} //length enforced by regex, so no min/max values specified
public Password( Password original ): base( original ) {}
public static explicit operator Password( string value ) {return new Password( value );}
}
So, when reading a value from the database or reading user input, the Password constructor forwards the Regex to the base class to handle the validation. Another trick is that it trims the end characters automatically, in case the database type is char rather than varchar, so I don't have to remember to do it. Anyway, here is what the main constructor for RegexConstrainedString looks like:
protected RegexConstrainedString( string value, Regex subclasses_static_regex, int? min_length, int? max_length )
{
_value = (value ?? String.Empty);
if (min_length != null)
if (_value.Length < min_length)
throw new Exception( "Value doesn't meet minimum length of " + min_length + " characters." );
if (max_length != null)
if (_value.Length > max_length)
throw new Exception( "Value exceeds maximum length of " + max_length + " characters." );
value_match = subclasses_static_regex.Match( _value ); //Match.Synchronized( subclasses_static_regex.Match( _value ) );
if (!value_match.Success)
throw new Exception( "Invalid value specified (" + _value + "). \nValue must match regex:" + subclasses_static_regex.ToString() );
}
Since callers would need access to the subclass's Regex, I think my best bet is to implement a IsValueValid method in the subclass, which forwards the data to the IsValueValid method in the RegexConstrainedString base class. In other words, I would add this line to the Password class:
public static bool IsValueValid( string value ) {return IsValueValid( value.TrimEnd(), regex, min_length, max_length );}
I don't like this however, because I'm replicating the subclasses constructor code, having to remember to trim the string again and pass the same min/max lengths when necessary. This requirement would be forced upon all subclasses of RegexConstrainedString, and it's not something I want to do. These data classes like Password is so simple, because RegexConstrainedString handles most of the work, implementing operators, comparisons, cloning, etc.
Furthermore, there are other complications with factoring out the code. The validation involves running and storing a Regex match in the instance, since some data types may have properties that report on specific elements of the string. For example, my SessionID class contains properties like TimeStamp, which return a matched group from the Match stored in the data class instance. The bottom line is that this static method is an entirely different context. Since it's essentially incompatible with the constructor context, the constructor cannot use it, so I would end up replicating code once again.
So... I could factor out the validation code by replicating it and tweaking it for a static context and imposing requirements on subclasses, or I could keep things much simpler and just perform the object construction. The relative extra memory allocated would be minimal, as only a string and Match reference is stored in the instance. Everything else, such as the Match and the string itself would still be generated by the validation anyway, so there's no way around that. I could worry about the performance all day, but my experience has been that correctness is more important, because correctness often leads to numerous other optimizations. For example, I don't ever have to worry about improperly formatted or sized data flowing through my application, because only meaningful data types are used, which forces validation to the point-of-entry into the application from other tiers, be it database or UI. 99% of my validation code was removed as a no-longer-necessary artifact, and I find myself only checking for nulls nowadays. Incidentally, having reached this point, I now understand why including nulls was the billion dollar mistake. Seems to be the only thing I have to check for anymore, even though they are essentially non-existent in my system. Complex objects having these data types as fields cannot be constructed with nulls, but I have to enforce that in the property setters, which is irritating, because they otherwise would never need validation code... only code that runs in response to changes in values.
UPDATE:
I simulated the CLR function calls both ways, and found that when all data is valid, the performance difference is only fractions of a millisecond per thousand calls, which is negligible. However, when roughly half the passwords are invalid, throwing exceptions in the "instantiation" version, it's three orders of magnitude slower, which equates to about 1 extra sec per 1000 calls. The magnitudes of difference will of course multiple as multiple CLR calls are made for multiple columns in the table, but that's a factor of 3 to 5 for my project. So, is an extra 3 - 5 second per 1000 updates acceptable to me, as a trade off for keeping my code very simple and clean? Well that depends on the update rate. If my application were getting 1000 updates per second, a 3 - 5 second delay would be devastating. If, on the other hand, I was getting 1000 updates a minute or an hour, it may be perfectly acceptable. In my situation, I can tell you now that it's quite acceptable, so I think I'll just go with the instantiation method, and allow the errors through. Of course, in this test I handled the errors in the CLR instead of letting SQL Server handle them. Marshalling the error info to SQL Server, and then possibly back to the application, could definitely slow things down much more. I guess I will have to fully implement this to get a real test, but from this preliminary test, I'm pretty sure what the results will be.

Resources