Static methods in GOSU and Thread-safety - static-methods

I have below function in a .gs class, which gets called when accessing specific Claim information -
public static function testVisibility(claim : Claim) : boolean {
if(claim.State == ClaimState.TC_OPEN){
return true;
}
else{
return false;
}
}
My question -
a) If two users are accessing their respective Claims information, this function should get called twice - first time it should receive the Claim instance of first user, second time Claim instance of second user. If the accessing in simultaneous - will two copies of the same function be invoked? Should not be the case, as static function is only one copy. So, if it's one copy, how is thread safety ensured? Will the function be called one-after-another?
b) Like Java, does Gosu also use Heap to run the static functions?

It seems you are confusing a little about the definition here. Thread-safe is only a mechanism created to protect the integrity of data shared between threads. Therefore, your example function is thread-safe, no matter if it is static or not.
a) For the reason mentioned above, there would be no thread-safety problem here, because you are working with 2 different sets of data.
b) Provided that Gosu is built to run on JVM, and produce .class files, I believe for the most part (if not 100%, beside the syntax) it will behave like Java.

This is a cliche confusion when we start loving any programming language.
Consider 100 people accessing a web-application exactly at a particular point of time, here as per your doubt, the static variable/function will return/share the content value for all the 100 people.
The fact is, data sharing won't happen here because for each server connection, each separate THREAD is created and the entire application works on that thread (called as one-thread-per-connection).
SO if have a static/global variable, that particular variable will work on 100 different threads, and content/data of each thread will be secure and cant be accessed from other threads(directly). This is how web applications works.
If we need to share some variables/Classes among threads, we have to make it singleton.
Eg, For database connections, we don't need to create the connection all the time if already an established connection exists. In that case the connection class will be singleton.
Hope this make sense. :)
-Aravind

Related

Why should I use thread-specific data?

Since each thread has its own stack, its private data can be put on it. For example, each thread can allocate some heap memory to hold some data structure, and use the same interface to manipulate it. Then why thread-specific data is helpful?
The only case that I can think of is that, each thread may have many kinds of private data. If we need to access the private data in any function called within that thread, we need to pass the data as arguments to all those functions, which is boring and error-prone.
Thread-local storage is a solution for avoiding global state. If data isn't shared across threads but is accessed by several functions, you can make it thread-local. No need to worry about breaking reentrancy. Makes debugging that much easier.
From a performance point of view, using thread-local data is a way of avoiding false sharing. Let's say you have two threads, one responsible for writing to a variable x, and the other responsible for reading from a variable y. If you were to define these as global variables, they could be on the same cache line. This means that if one of the threads writes to x, the CPU will update the cache line, and this of course includes the variable y, so cache performance will degrade, because there was no reason to update y.
If you used thread-local data, one thread would only store the variable x and the other would only store the variable y, thus avoiding false sharing. Bear in mind, though, that there are other ways to go about this, e.g. cache line padding.
Unlike the stack (which, like thread-local data is dedicated to each thread), thread-local data is useful because it persists through function calls (unlike stack data which may already be overwritten if used out of its function).
The alternative would be to use adjacent pieces of global data dedicated to each thread, but that has some performance implications when the CPU caches are concerned. Since different threads are likely to run on different cores, such "sharing" of a global piece of data may bring some undesirable performance degradation because an access from one core may invalidate the cache-line of another, with the latter contributing to more inter-core traffic to ensure cache consistency.
In contrast, working with thread-local data should conceptually not involve messing up with the cache of other cores.
Think of thread local storage as another kind of global variable. It's global in the sense that you don't have to pass it around, different code can access it as they please (given the declaration of course). However, each different thread has its own separate variable. Normally, globals are extra bad in multithreaded programming bacause other threads can change the value. If you make it thread local, only your thread can see it so it is impossible for another thread to unexpectedly change it.
Another use case is when you are forced to use a (badly designed) API that expects you to use global variables to carry information to callback functions. This is a simple instance of being forced into a global variable, but using thread local storage to make it thread safe.
Well, I've been writing multithreaded apps for 30 odd years and have never, ever found any need to use TLS. If a thread needs a DB connection that the DB binds to the thread, the thread can open one of its own and keep it on the stack. Since threads cannot be called, only signaled, there is no problem. Every time I've ever looked at this magic 'TLS', I've realized it's not a solution to my problem.
With my typical message-passing design, where objects are queued in to threads that never terminate, there is just no need for TLS.
With thread-pools it's even more useless.
I can only say that using TLS=bad design. Someone put me right, if you can :)
I've used thread local storage for database connections and sometimes for request / response objects. To give two examples, both from a Java webapp environment, but the principles hold.
A web app might consist of a large spread of code that calls various subsystems. Many of these might need to access the database. In my case, I had written each subsystem that required the db to get a db connection from a db pool, use the connection, and return the connection to the pool. Thread-local storage provided a simpler alternative: when the request is created, fetch a db connection from the pool and store it in thread-local storage. Each subsystem then just uses the db connection from thread-local storage, and when the request is completing, it returns the connection to the db pool. This solution had performance advantages, while also not requiring me to pass the db connection through every level: ie my parameter lists remained shorter.
In the same web app, I decided in one remote subsystem that I actually wanted to see the web Request object. So I had either to refactor to pass this object all the way down, which would have involved a lot of parameter passing and refactoring, or I could simply place the object into Thread Local storage, and retrieve it when I wanted it.
In both cases, you could argue that I had messed up the design in the first place, and was just using Thread Local storage to save my bacon. You might have a point. But I could also argue that Thread Local made for cleaner code, while remaining thread-safe.
Of course, I had to be very sure that the things I was putting into Thread Local were indeed one-and-only-one per thread. In the case of a web app, the Request object or a database connection fit this description nicely.
I would like to add on the above answers, that as far as I know, performance wise, allocation on stack is faster than allocation on heap.
Regarding passing the local data across calls , well - if you allocate on heap, you will need to pass the pointer / reference (I'm a Java guy :) ) to the calls - otherwise, how will you access the memory?
TLS is also good in order to store a context for passing data across calls within a thread (We use it to hold information on a logged on user across the thread - some sort of session management).
Thread Specific Data is used when all the functions of a particular thread needs to access one common variable. This variable is local to that particular thread but acts as a global variable for all the functions of that thread.
Let's say we have two threads t1 and t2 of any process. Variable 'a' is the thread specific data for t1. Then, t2 has no knowledge over 'a' but all the functions of t1 can access 'a' as a global variable. Any change in 'a' will be seen by all the functions of t1.
With new OOP techniques available, I find thread specific data as irrelevant. Instead of passing the function to the thread, you can pass the functor. The functor class that you pass, can hold any thread specific data that you need.
Eg. Sample code with C++11 or boost would like like below
MyClassFunctor functorobj; <-- Functor Object. Can hold the function that runs as part of thread as well as any thread specific data
boost::thread mythread(functorobj);
Class MyClassFunctor
{
private:
std::list mylist; <-- Thread specific data
public:
void operator () ()
{
// This function is called when the thread runs
// This can access thread specific data mylist.
}
};

Synchronizing Asynchronous request handlers in Silverlight environment

For our senior design project my group is making a Silverlight application that utilizes graph theory concepts and stores the data in a database on the back end. We have a situation where we add a link between two nodes in the graph and upon doing so we run analysis to re-categorize our clusters of nodes. The problem is that this re-categorization is quite complex and involves multiple queries and updates to the database so if multiple instances of it run at once it quickly garbles data and breaks (by trying to re-insert already used primary keys). Essentially it's not thread safe, and we're trying to make it safe, and that's where we're failing and need help :).
The create link function looks like this:
private Semaphore dblock = new Semaphore(1, 1);
// This function is on our service reference and gets called
// by the client code.
public int addNeed(int nodeOne, int nodeTwo)
{
dblock.WaitOne();
submitNewNeed(createNewNeed(nodeOne, nodeTwo));
verifyClusters(nodeOne, nodeTwo);
dblock.Release();
return 0;
}
private void verifyClusters(int nodeOne, int nodeTwo)
{
// Run analysis of nodeOne and nodeTwo in graph
}
All copies of addNeed should wait for the first one that comes in to finish before another can execute. But instead they all seem to be running and conflicting with each other in the verifyClusters method. One solution would be to force our front end calls to be made synchronously. And in fact, when we do that everything works fine, so the code logic isn't broken. But when it's launched our application will be deployed within a business setting and used by internal IT staff (or at least that's the plan) so we'll have the same problem. We can't force all clients to submit data at different times, so we really need to get it synchronized on the back end. Thanks for any help you can give, I'd be glad to supply any additional information that you could need!
I wrote a series to specifically address this situation - let me know if this works for you (sequential asynchronous workflows):
Part 2 (has a link back to the part1):
http://csharperimage.jeremylikness.com/2010/03/sequential-asynchronous-workflows-part.html
Jeremy
Wrap your database updates in a transaction. Escalate to a table lock if necessary

Repository Pattern Question

I'm building an ASP.NET MVC app and I'm using a repository to store and retrieve view objects. My question is, is it okay for the implementation of the various repositories to call each other? I.E. can the ICustomerRepository implementation call an implementation of IAddressRepository, or should it handle its own updates to the address data source?
EDIT:
Thanks everyone, the the Customer/Address example isn't real. The actual problem involves three aggregates which update a fourth aggregate in response to changes in their respective states. I in this case it seems a conflict between introducing dependencies vs violating the don't repeat yourself principle.
You should have a repository for each aggregate root.
I have no knowledge of your domain-model, but it doesn't feel natural for me to have an IAddressRepository. Unless 'Address' is an aggregate root in your domain.
In fact, in most circumstances, 'Address' is not even an entity, but a value object. That is, in most cases the identity of an 'Address' is determined by its value (the value of all its properties); it does not have a separate 'Id' (key) field.
So, when this is the case, the CustomerRepository should be responsible for storing the Address, as the Address is a part of the Customer aggregate-root.
Edit (ok so your situation was just an example):
But, if you have other situations where you would need another repository in a repository, then I think that it is better to remove that functionality out of that repository, and put it in a separate class (Service).
I mean: I think that, if you have some functionality inside a repository A, that relies on another repository B, then this kind of functionality doesn't belong inside repository A.
Instead, write another class (which is called a Service in DDD), in where you implement this functionality.
Anyway, I do not think that repositories should really call each other. If you do not want to write a Service however, and if you really want to keep that logic inside the repository itself, then pass the other repository as an argument in that specific method.
I hope I made myself a bit clear. :P
They really shouldn't call each other. A Repository is an abstraction of the (more or less) atomic operations that you want to perform on your domain. The less dependency they have, the better. Realistically, any consumer of a repository should expect to be able to point the repository class at a database and have it perform the necessary domain operations without a lot of configuration.
They should also represent "aggregates" in your domain - i.e. key focal points that a lot of functionality will be based around. I'm wondering why you would have a separate address information repository? Shouldn't that be part of your customer repository?
This depends on the type of repository (or at least the consequences do) but in general if you have data repositories calling each other you're going to run into problems with things like cyclical (repo A -> requires B -> requires C -. oops, requires A) or recursive data loads (A-> requires B & C -> C-requires D, E -> .... ->..ad nauseum). Testing also becomes more difficult.
For example, you need to load your address repository to properly run your customer repository, because the customer repository calls the address repo. If you need to test the customer repo, you'll need to do a db load of the addresses or mock them in some way, and ultimately you won't be able to load and test any single system repository without loading them all.
Having those dependencies is also kind of insidious because they're often not clear - usually you're dealing with a repository as a data-holding abstraction - if you have to be conscious of how they depend on each other you can't use them as an abstraction, but have to manage the load process whenever you want to use them.

Can I get the instances of alive objects of a certain type in C#?

This is a C# 3.0 question. Can I use reflection or memory management classes provided by .net framework to count the total alive instances of a certain type in the memory?
I can do the same thing using a memory profiler but that requires extra time to dump the memory and involves a third party software. What I want is only to monitor a certain type and I want a light-weighted method which can go easily to unit tests. The purpose to count the alive instances is to ensure I don't have any expected living instances that cause "memory leak".
Thanks.
To do it entirely within the application you could do an instance-counter, but it would need to be explicitly coded and managed inside each class--there's no silver bullet that I'm aware of to let you query the framework from within the executing code to see how many instances are alive.
What you're asking for is really the domain of a profiler. You can purchase one or build your own, but it requires your application to run as a child process of the profiler. Rolling your own isn't an easy undertaking, by the way.
If you want to consider the instance counter it would have to be something like:
public class MyClass : IDisposable
public MyClass()
{
++ClassInstances;
}
public void Dispose()
{
--ClassInstances;
}
private static new object _ClassInstancesLock;
private static int _ClassInstances;
private static int ClassInstances
{
get
{
lock (_ClassInstancesLock)
{
return _ClassInstances
}
}
}
This is just a really rough sample, no tests for compilation; 0% guarantee for thread-safety (critical for this type of approach) and it leaves the door wide open for Dispose to be called, the instance counter to decrement, but for the object not to properly GC. To diagnose that bundle of joy you'll need, you guessed it, a professional profiler--or at least windbg.
Edit: I just noticed the very last line of your question and needed to say that my above approach, as shoddy and failure-prone as it is, is almost guaranteed to deceive and lie to you about the true number of instances if you're experiencing a leak. The best tool, IMO, for attacking these problems is ANTS Memory Profiler. Version 5 is a double-edge in that they broke Performance and Memory profiler into two seperate SKUs (used to be bundled together) but Memory Profiler 5.0 is absolutely lightning fast. Profiling these problems used to be slow as molases, but they've gotten around it somehome.
Unless this is for a personal project with 0 intent of redistribution you should invest the few hundred dollars needed for ANTS--but by all means use it's trial period first. It's a great tool for exactly this kind of analysis.
The only way I see to do this is without any form of instrumentation to use the CLR Profiling API to track object lifetimes. I'm not aware of any APIs available to the managed code to do the same thing, and, so far as I know, CLR doesn't keep the list of live objects anywhere (so even with profiler API you have to create the data structures for that yourself).
VB.NET has a feature where it lets you track objects in debugger, but it actually emits additional code specifically for that (which basically registers all created objects in internal list of weak references). You could do that as well, using e.g. PostSharp to post-process your assemblies.

SSRS Code Shared Variables and Simultaneous Report Execution

We have some SSRS reports that are failing when two of them are executed very close together.
I've found out that if two instances of an SSRS report run at the same time, any Code variables declared at the class level (not inside a function) can collide. I suspect this may be the cause of our report failures and I'm working up a potential fix.
The reason we're using the Code portion of SSRS at all is for things like custom group and page header calculation. The code is called from expressions in TextBoxes and returns what the current label should be. The code needs to maintain state to remember what the last header value was in order return it when unknown or to store the new header value for reuse.
Note: here are my resources for the variable collision problem:
The MSDN SSRS Forum:
Because this uses static variables, if two people run the report at the exact same
moment, there's a slim chance one will smash the other's variable state (In SQL 2000,
this could occasionally happen due to two users paginating through the same report at
the same time, not just due to exactly simultaneous executions). If you need to be 100%
certain to avoid this, you can make each of the shared variables a hash table based on
user ID (Globals!UserID).
Embedded Code in Reporting Services:
... if multiple users are executing the report with this code at the same time, both
reports will be changing the same Count field (that is why it is a shared field). You
don’t want to debug these sorts of interactions – stick to shared functions using only
local variables (variables passed ByVal or declared in the function body).
I guess the idea is that on the report generation server, the report is loaded and the Code module is a static class. If a second clients ask for the same report as another quickly enough, it connects to the same instance of that static class. (You're welcome to correct my description if I'm getting this wrong.)
So, I was proceeding with the idea of using a hash table to keep things isolated. I was planning on the hash key being an internal report parameter called InstanceID with default =Guid.NewGuid().ToString().
Part way through my research into this, though, I found that it is even more complicated because Hashtables aren't thread-safe, according to Maintaining State in Reporting Services.
That writer has code similar to what I was developing, only the whole thread-safe thing is completely outside my experience. It's going to take me hours to research all this and put together sensible code that I can be confident of and that performs well.
So before I go too much farther, I'm wondering if anyone else has already been down this path and could give me some advice. Here's the code I have so far:
Private Shared Data As New System.Collections.Hashtable()
Public Shared Function Initialize() As String
If Not Data.ContainsKey(Parameters!InstanceID.Value) Then
Data.Add(Parameters!InstanceID.Value, New System.Collections.Hashtable())
End If
LetValue("SomethingCount", 0)
Return ""
End Function
Private Shared Function GetValue(ByVal Name As String) As Object
Return Data.Item(Parameters!InstanceID.Value).Item(Name)
End Function
Private Shared Sub LetValue(ByVal Name As String, ByVal Value As Object)
Dim V As System.Collections.Hashtable = Data.Item(Parameters!InstanceID.Value)
If Not V.ContainsKey(Name) Then
V.Add(Name, Value)
Else
V.Item(Name) = Value
End If
End Sub
Public Shared Function SomethingCount() As Long
SomethingCount = GetValue("SomethingCount") + 1
LetValue("SomethingCount", SomethingCount)
End Function
My biggest concern here is thread safety. I might be able to figure out the rest of the questions below, but I am not experienced with this and I know it is an area that it is EASY to go wrong in. The link above uses the method Dim _sht as System.Collections.Hashtable = System.Collections.Hashtable.Synchronized(_hashtable). Is that best? What about Mutex? Semaphore? I have no experience in this.
I think the namespace System.Collections for Hashtable is correct, but I'm having trouble adding System.Collections as a reference in my report to try to cure my current error of "Could not load file or assembly 'System.Collections'". When I browse to add the reference, it's not an available component to select.
I just confirmed that I can call code from a parameter's default value expression, so I'll put my Initialize code there. I also just found out about the OnInit procedure, but this has its own gotchas to research and work around: the Parameters collection may not be referenced from the OnInit method during parameter initialization.
I'm unsure about declaring the Data variable as New, perhaps it should be only be instantiated in the initializer if not already done (but I worry about race conditions because of the delay between the check that it's empty and the instantiation of it).
I also have a question about the Shared keyword. Is it necessary in all cases? I get errors if I leave it off function declarations, but it appears to work when I leave it off the variable declaration. Testing multiple simultaneous report executions is difficult... Could someone explain what Shared means specifically in the context of SSRS Code?
Is there a better way to initialize variables? Should I provide a second parameter to the GetValue function which is the default value to use if it finds that the variable doesn't exist in the hashtable yet?
Is it better to have nested Hashtables as I chose in my implementation, or to concatenate my InstanceID with the variable name to have a flat hashtable?
I'd really appreciate guidance, ideas and/or critiques on any aspect of what I've presented here.
Thank you!
Erik
Your code looks fine. For thread safety only the root (shared) hashtable Data needs to be synchronised. If you want to avoid using your InstanceID you could use Globals.ExecutionTime and User.UserID concatenated.
Basically I think you just want to change to initialize like this:
Private Shared Data As System.Collections.Hashtable
If Data Is Nothing Then
Set Data = Hashtable.Synchronized(New System.Collections.Hashtable())
End If
The contained hashtables should only be used by one thread at a time anyway, but if in doubt, you could synchronize them too.

Resources