Can I get the instances of alive objects of a certain type in C#? - winforms

This is a C# 3.0 question. Can I use reflection or memory management classes provided by .net framework to count the total alive instances of a certain type in the memory?
I can do the same thing using a memory profiler but that requires extra time to dump the memory and involves a third party software. What I want is only to monitor a certain type and I want a light-weighted method which can go easily to unit tests. The purpose to count the alive instances is to ensure I don't have any expected living instances that cause "memory leak".
Thanks.

To do it entirely within the application you could do an instance-counter, but it would need to be explicitly coded and managed inside each class--there's no silver bullet that I'm aware of to let you query the framework from within the executing code to see how many instances are alive.
What you're asking for is really the domain of a profiler. You can purchase one or build your own, but it requires your application to run as a child process of the profiler. Rolling your own isn't an easy undertaking, by the way.
If you want to consider the instance counter it would have to be something like:
public class MyClass : IDisposable
public MyClass()
{
++ClassInstances;
}
public void Dispose()
{
--ClassInstances;
}
private static new object _ClassInstancesLock;
private static int _ClassInstances;
private static int ClassInstances
{
get
{
lock (_ClassInstancesLock)
{
return _ClassInstances
}
}
}
This is just a really rough sample, no tests for compilation; 0% guarantee for thread-safety (critical for this type of approach) and it leaves the door wide open for Dispose to be called, the instance counter to decrement, but for the object not to properly GC. To diagnose that bundle of joy you'll need, you guessed it, a professional profiler--or at least windbg.
Edit: I just noticed the very last line of your question and needed to say that my above approach, as shoddy and failure-prone as it is, is almost guaranteed to deceive and lie to you about the true number of instances if you're experiencing a leak. The best tool, IMO, for attacking these problems is ANTS Memory Profiler. Version 5 is a double-edge in that they broke Performance and Memory profiler into two seperate SKUs (used to be bundled together) but Memory Profiler 5.0 is absolutely lightning fast. Profiling these problems used to be slow as molases, but they've gotten around it somehome.
Unless this is for a personal project with 0 intent of redistribution you should invest the few hundred dollars needed for ANTS--but by all means use it's trial period first. It's a great tool for exactly this kind of analysis.

The only way I see to do this is without any form of instrumentation to use the CLR Profiling API to track object lifetimes. I'm not aware of any APIs available to the managed code to do the same thing, and, so far as I know, CLR doesn't keep the list of live objects anywhere (so even with profiler API you have to create the data structures for that yourself).
VB.NET has a feature where it lets you track objects in debugger, but it actually emits additional code specifically for that (which basically registers all created objects in internal list of weak references). You could do that as well, using e.g. PostSharp to post-process your assemblies.

Related

Apache Flink - Filter Performance Tips

Let's say you are working on a big flink project. And also you are keyBy the client ip addresses of your customers.
And realized that you are going to filter the same things in the different code places like that:
public void calculationOne(){
kafkaSource.filter(isContainsSmthA).keyBy(clientip).process(processA).sink(...);
}
public void calculationTwo(){
kafkaSource.filter(isContainsSmthA).keyBy(clientip).process(processB).sink(...);
}
And assumed that they are many kafkaSource.filter(isContainsSmthA)..
Now this structure leads to performance issue in the flink?
If I did something like the below, would be much better?
public Stream filteredA(){
return kafkaSource.filter(isContainsSmthA);
public void calculationOne(){
filteredA().keyBy(clientip).process(processA).sink(...);
}
public void calculationTwo(){
filteredA().keyBy(clientip).process(processB).sink(...);
}
It depends a bit on how it should behave operationally.
The first way is a more friendly to the Kafka cluster: all records are read once. The filter itself is a very cheap operation, so you don't need to worry to much about it. However, the big downside of this approach is that if one calculations is much slower than the others, it will slow them down. If you do not process historic events, it shouldn't matter as you'd size your application cluster to keep up with all events anyways. Another current downside is that if you have a failure in calculationTwo also tasks in calculationOne are restarted. The community is actively working to mitigate that though.
The second way would allow only the affected source -> ... -> sink subtopology to be restarted. So if you expect frequent restarts or need to guarantee certain SLAs, this approach is better. An extension is to actually have separate Flink applications for each of these pipelines. You can share the same jar, but use different arguments to select the correct pipeline on submission. This approach also makes updating of applications much easier as you would only experience downtime for the pipeline that you actually modify.
I might do something like below, where a simple wrapper operator can run data through two different functions, and generate two side outputs.
SingleOutputStreamOperator comboResults = kafkaSource
.filter(isContainsSmthA)
.keyBy(clientip)
.process(new MyWrapperFunction(processA, processB));
comboResults
.getSideOutput(processATag)
.sink(...);
comboResults
.getSideOutput(processBTag)
.sink(...);
Though I don't know how that compares with what Arvid suggested.

Short lived DbContext in WPF application reasonable?

In his book on DbContext, #RowanMiller shows how to use the DbSet.Local property to avoid 1.) unnecessary roundtrips to the database and 2.) passing around collections (created with e.g. ToList()) in the application (page 24). I then tried to follow this approach. However, I noticed that from one using [} – block to another, the DbSet.Local property becomes empty:
ObservableCollection<Destination> destinationsList;
using (var context = new BAContext())
{
var query = from d in context.Destinations …;
query.Load();
destinationsList = context.Destinations.Local; //Nonzero here.
}
//Do stuff with destinationsList
using (var context = new BAContext())
{
//context.Destinations.Local zero here again;
//So no way of getting the in-memory data from the previous using- block here?
//Do I have to do another roundtrip to the database here to get the same data I wanted
//to cache locally???
}
Then, what is the point on page 24? How can I avoid the passing around of my collections if the DbSet.Local is only usable inside the using- block? Furthermore, how can I benefit from the change tracking if I use these short-lived context instances not handing over any cache data to each others under the hood? So, if the contexts should be short-lived for freeing resources such as connections, have I to give up the caching for this? I.e. I can’t use both at the same time (short-lived connections but long-lived cache)? So my only option would be to store the results returned by the query in my own variables, exactly what is discouraged in the motivation on page 24?
I am developing a WPF application which maybe will also become multi-tiered in the future, involving WCF. I know Julia has an example of this in her book, but I currently don’t have access to it. I found several others on the web, e.g. http://msdn.microsoft.com/en-us/magazine/cc700340.aspx (old ObjectContext, but good in explaining the inter-layer-collaborations). There, a long-lived context is used (although the disadvantages are mentioned, but no solution to these provided).
It’s not only that the single Destinations.Local gets lost, as you surely know all other entities fetched by the query are, too.
[Edit]:
After some more reading in Julia Lerman’s book, it seems to boil down to that EF does not have 2nd level caching per default; with some (considerable, I think) effort, however, ones can add 3rd party caching solutions, as is also described in the book and in various articles on MSDN, codeproject etc.
I would have appreciated if this problem had been mentioned in the section about DbSet.Local in the DbContext book that it is in fact a first level cache which is destroyed when the using {} block ends (just my proposal to make it more transparent to the readers). After first reading I had the impression DbSet.Local would always return the same reference (Singleton-style) also in the second using {} block despite the new DbContext instance.
But I am still unsure whether the 2nd level cache is the way to go for my WPF application (as Julia mentions the 2nd level cache in her article for distributed applications)? Or is the way to go to get my aggregate root instances (DDD, Eric Evans) of my domain model into memory by one or some queries in a using {} block, disposing the DbContext and only holding the references to the aggregate instances, this way avoiding a long-lived context? It would be great if you could help me with this decision.
http://msdn.microsoft.com/en-us/magazine/hh394143.aspx
http://www.codeproject.com/Articles/435142/Entity-Framework-Second-Level-Caching-with-DbConte
http://blog.3d-logic.com/2012/03/31/using-tracing-and-caching-provider-wrappers-with-codefirst/
The Local property provides a “local view of all Added, Unchanged, and Modified entities in this set”. Like all change tracking it is specific to the context you are currently using.
The DB Context is a workspace for loading data and preparing changes.
If two users were to add changes at the same time, they must not know of the others changes before they saved them. They may discard their prepared changes which suddenly would lead to problems for other other user as well.
A DB Context should be short lived indeed, but may be longer than super short when necessary. Also consider that you may not save resources by keeping it short lived if you do not load and discard data but only add changes you will save. But it is not only about resources but also about the DB state potentially changing while the DB Context is still active and has data loaded; which may be important to keep in mind for longer living contexts.
If you do not know yet all related changes you want to save into the database at once then I suggest you do not use the DB Context to store your changes in-memory but in a data structure in your code.
You can of course use entity objects for doing so without an active DB Context. This makes sense if you do not have another appropriate data class for it and do not want to create one, or decide preparing the changes in them make more sense. You can then use DbSet.Attach to attach the entities to a DB Context for saving the changes when you are ready.

is it a bad practice to have a static field?

I use a static field in this situation because I think it is time consuming to recreate the object at each request.
private static AnalysedCompanies db = new AnalysedCompanies();
public class AnalysedCompanies:DbContext
{
...
}
I use Entity Framework code first.
than I have methods for saving and loading data from the database trough the db object.
Is the static db object going to cause a bottleneck? Is this the right thing to do?
In a ASP.net Web Application, static are shared by all users, so yes, that's pretty bad as it means that User A can possibly see/modify data that User B sees and leads to all sorts of headaches.
Static fields are fine for static data, that is data that a) is shared for everyone and b) isn't modified by users (as changes are global to all other users). I do use statics for stuff like System Configuration or objects that can be safely shared.
I think the main problem is this: "I think it is time consuming" - don't guess, measure. There are many profilers available for .net. If you have performance issues, measure to see if it really is a problem and then act.

Are there any good software development "patterns" for memory intensive .net programs?

Basically I'm working on a program that processes a lot of large video and image files, and I'm struggling with the memory management side of it because I've never dealt with anything quite like this before.
For instance, it stores all these images in a database, and loads a list of videos, and then you can switch between the videos and view images from the video. Right now, it's keeping all of those images in memory all the time, which is eating up a lot of space. I know I can lazy load the images, but once you've switched back and forth you get all of them stuck in memory.
I want to take advantage of the WPF databinding functionality and MVVM as much as possible, but if I need to look at a different architecture I will.
I'm just looking for general advice, tips, links to articles, or anything that could help.
One of the things you could look at is data virtualization, which is not provided in WPF by default (they provide UI virtualization instead). Data virtualization can say 'load and bind the data for an item / range of items while visible, then unload when not visible'.
Here's a great article that describes a concrete implementation that you may be able to use as-is or adapt:
http://www.codeproject.com/KB/WPF/WpfDataVirtualization.aspx
It sounds like the main problem you're having is not so much the performance-intensiveness of the application (which things like fixed-size buffers and static allocation will help with) but its overall memory footprint. The way to control that is with virtualization.
Lazy loading gets you halfway there: you don't actually create the object until something needs it. That's fine, but the longer the user works with the application and the more objects he visits in the UI, the more objects get created, and eventually the application runs out of memory.
So you want to throw away objects that the user doesn't need anymore. Figuring out which objects the user doesn't need can be a hard problem, but it can also be as easy as assuming that the user doesn't need the object that he used least recently. You use a least-recently-used (LRU) cache to do this.
This is totally consistent with the MVVM pattern. In your view class, you make your property getter for the object use this pseudocode:
if object hasn't been loaded
load object
add object to the LRU cache (whether you loaded it or not)
return object
The LRU cache I wrote keeps a simple queue of the objects it contains. When you add an object to the cache, if it's not already in the queue it gets added to the back, and if it is already in the queue it gets moved to the back.
If the queue's at its capacity when you add an object, it pops off whatever is at the front of the queue (which is the one that was used least recently) and raises the DiscardingOldestItem event.
This event is the object's chance to tell anything that holds a reference to it (i.e. the view object that it's a property of) that it needs to be discarded (probably by raising an event of its own). The view object's event handler should first raise the PropertyChanged event. If the property getter gets called when it does this, there's a binding somewhere that's still looking at the property, so it shouldn't be discarded yet. (Also, since the getter was called, the object just got moved to the back of the queue.) Otherwise, it can be thrown away.
(Note that if you have more objects visible in the UI than the cache can hold, this little dance becomes an infinite loop and you'll get a stack overflow.)
A more sophisticated approach would have the LRU cache start discarding old items when the application started running low on memory (it uses a fixed capacity right now). That's a straightforward change, but if you make that change, the scenario described in the previous paragraph is something you need to give more thought to; one very large object could result in the whole UI going kablooey.
It seems that to increase raw performance you would actually want to avoid patterns. They have their uses, don't get me wrong, but if you're trying to blast video at the highest performance possible the last thing you need to do it introduce abstraction layers that are designed to write higher quality code, not increase application performance.
this article on informIt has a lot of good info on the subject although it is more c and c++.
Static Allocation Pattern: Allocates memory up front
It suggests,
Pool Allocation Pattern: Preallocates pools of needed objects
Fixed Sized Buffer Pattern: Allocates memory in same-sized blocks
Smart Pointer Pattern: Makes pointers reliable
Garbage Collection Pattern: Automatically reclaims lost memory
Garbage Compactor Pattern: Automatically defragments and reclaims memory
"I know I can lazy load the images,
but once you've switched back and
forth you get all of them stuck in
memory."
This is not true to my understanding. The images can get garbage collected just like anything else, by removing all references. Are you sure you dont have a reference to them somewhere? Try a memory profiler like memprofiler or ANTS to see whats happening.
To those who have found this question looking for general patterns (not WPF) to reduce memory, the famous one (which I have never seen used!) is The Flyweight pattern

How to unit test an object with database queries

I've heard that unit testing is "totally awesome", "really cool" and "all manner of good things" but 70% or more of my files involve database access (some read and some write) and I'm not sure how to write a unit test for these files.
I'm using PHP and Python but I think it's a question that applies to most/all languages that use database access.
I would suggest mocking out your calls to the database. Mocks are basically objects that look like the object you are trying to call a method on, in the sense that they have the same properties, methods, etc. available to caller. But instead of performing whatever action they are programmed to do when a particular method is called, it skips that altogether, and just returns a result. That result is typically defined by you ahead of time.
In order to set up your objects for mocking, you probably need to use some sort of inversion of control/ dependency injection pattern, as in the following pseudo-code:
class Bar
{
private FooDataProvider _dataProvider;
public instantiate(FooDataProvider dataProvider) {
_dataProvider = dataProvider;
}
public getAllFoos() {
// instead of calling Foo.GetAll() here, we are introducing an extra layer of abstraction
return _dataProvider.GetAllFoos();
}
}
class FooDataProvider
{
public Foo[] GetAllFoos() {
return Foo.GetAll();
}
}
Now in your unit test, you create a mock of FooDataProvider, which allows you to call the method GetAllFoos without having to actually hit the database.
class BarTests
{
public TestGetAllFoos() {
// here we set up our mock FooDataProvider
mockRepository = MockingFramework.new()
mockFooDataProvider = mockRepository.CreateMockOfType(FooDataProvider);
// create a new array of Foo objects
testFooArray = new Foo[] {Foo.new(), Foo.new(), Foo.new()}
// the next statement will cause testFooArray to be returned every time we call FooDAtaProvider.GetAllFoos,
// instead of calling to the database and returning whatever is in there
// ExpectCallTo and Returns are methods provided by our imaginary mocking framework
ExpectCallTo(mockFooDataProvider.GetAllFoos).Returns(testFooArray)
// now begins our actual unit test
testBar = new Bar(mockFooDataProvider)
baz = testBar.GetAllFoos()
// baz should now equal the testFooArray object we created earlier
Assert.AreEqual(3, baz.length)
}
}
A common mocking scenario, in a nutshell. Of course you will still probably want to unit test your actual database calls too, for which you will need to hit the database.
Ideally, your objects should be persistent ignorant. For instance, you should have a "data access layer", that you would make requests to, that would return objects. This way, you can leave that part out of your unit tests, or test them in isolation.
If your objects are tightly coupled to your data layer, it is difficult to do proper unit testing. The first part of unit test, is "unit". All units should be able to be tested in isolation.
In my C# projects, I use NHibernate with a completely separate Data layer. My objects live in the core domain model and are accessed from my application layer. The application layer talks to both the data layer and the domain model layer.
The application layer is also sometimes called the "Business Layer".
If you are using PHP, create a specific set of classes ONLY for data access. Make sure your objects have no idea how they are persisted and wire up the two in your application classes.
Another option would be to use mocking/stubs.
The easiest way to unit test an object with database access is using transaction scopes.
For example:
[Test]
[ExpectedException(typeof(NotFoundException))]
public void DeleteAttendee() {
using(TransactionScope scope = new TransactionScope()) {
Attendee anAttendee = Attendee.Get(3);
anAttendee.Delete();
anAttendee.Save();
//Try reloading. Instance should have been deleted.
Attendee deletedAttendee = Attendee.Get(3);
}
}
This will revert back the state of the database, basically like a transaction rollback so you can run the test as many times as you want without any sideeffects. We've used this approach successfully in large projects. Our build does take a little long to run (15 minutes), but it is not horrible for having 1800 unit tests. Also, if build time is a concern, you can change the build process to have multiple builds, one for building src, another that fires up afterwards that handles unit tests, code analysis, packaging, etc...
I can perhaps give you a taste of our experience when we began looking at unit testing our middle-tier process that included a ton of "business logic" sql operations.
We first created an abstraction layer that allowed us to "slot in" any reasonable database connection (in our case, we simply supported a single ODBC-type connection).
Once this was in place, we were then able to do something like this in our code (we work in C++, but I'm sure you get the idea):
GetDatabase().ExecuteSQL( "INSERT INTO foo ( blah, blah )" )
At normal run time, GetDatabase() would return an object that fed all our sql (including queries), via ODBC directly to the database.
We then started looking at in-memory databases - the best by a long way seems to be SQLite. (http://www.sqlite.org/index.html). It's remarkably simple to set up and use, and allowed us subclass and override GetDatabase() to forward sql to an in-memory database that was created and destroyed for every test performed.
We're still in the early stages of this, but it's looking good so far, however we do have to make sure we create any tables that are required and populate them with test data - however we've reduced the workload somewhat here by creating a generic set of helper functions that can do a lot of all this for us.
Overall, it has helped immensely with our TDD process, since making what seems like quite innocuous changes to fix certain bugs can have quite strange affects on other (difficult to detect) areas of your system - due to the very nature of sql/databases.
Obviously, our experiences have centred around a C++ development environment, however I'm sure you could perhaps get something similar working under PHP/Python.
Hope this helps.
You should mock the database access if you want to unit test your classes. After all, you don't want to test the database in a unit test. That would be an integration test.
Abstract the calls away and then insert a mock that just returns the expected data. If your classes don't do more than executing queries, it may not even be worth testing them, though...
The book xUnit Test Patterns describes some ways to handle unit-testing code that hits a database. I agree with the other people who are saying that you don't want to do this because it's slow, but you gotta do it sometime, IMO. Mocking out the db connection to test higher-level stuff is a good idea, but check out this book for suggestions about things you can do to interact with the actual database.
I usually try to break up my tests between testing the objects (and ORM, if any) and testing the db. I test the object-side of things by mocking the data access calls whereas I test the db side of things by testing the object interactions with the db which is, in my experience, usually fairly limited.
I used to get frustrated with writing unit tests until I start mocking the data access portion so I didn't have to create a test db or generate test data on the fly. By mocking the data you can generate it all at run time and be sure that your objects work properly with known inputs.
Options you have:
Write a script that will wipe out database before you start unit tests, then populate db with predefined set of data and run the tests. You can also do that before every test – it'll be slow, but less error prone.
Inject the database. (Example in pseudo-Java, but applies to all OO-languages)
class Database {
public Result query(String query) {... real db here ...}
}
class MockDatabase extends Database {
public Result query(String query) {
return "mock result";
}
}
class ObjectThatUsesDB {
public ObjectThatUsesDB(Database db) {
this.database = db;
}
}
now in production you use normal database and for all tests you just inject the mock database that you can create ad hoc.
Do not use DB at all throughout most of code (that's a bad practice anyway). Create a "database" object that instead of returning with results will return normal objects (i.e. will return User instead of a tuple {name: "marcin", password: "blah"}) write all your tests with ad hoc constructed real objects and write one big test that depends on a database that makes sure this conversion works OK.
Of course these approaches are not mutually exclusive and you can mix and match them as you need.
Unit testing your database access is easy enough if your project has high cohesion and loose coupling throughout. This way you can test only the things that each particular class does without having to test everything at once.
For example, if you unit test your user interface class the tests you write should only try to verify the logic inside the UI worked as expected, not the business logic or database action behind that function.
If you want to unit test the actual database access you will actually end up with more of an integration test, because you will be dependent on the network stack and your database server, but you can verify that your SQL code does what you asked it to do.
The hidden power of unit testing for me personally has been that it forces me to design my applications in a much better way than I might without them. This is because it really helped me break away from the "this function should do everything" mentality.
Sorry I don't have any specific code examples for PHP/Python, but if you want to see a .NET example I have a post that describes a technique I used to do this very same testing.
I agree with the first post - database access should be stripped away into a DAO layer that implements an interface. Then, you can test your logic against a stub implementation of the DAO layer.
You could use mocking frameworks to abstract out the database engine. I don't know if PHP/Python got some but for typed languages (C#, Java etc.) there are plenty of choices
It also depends on how you designed those database access code, because some design are easier to unit test than other like the earlier posts have mentioned.
I've never done this in PHP and I've never used Python, but what you want to do is mock out the calls to the database. To do that you can implement some IoC whether 3rd party tool or you manage it yourself, then you can implement some mock version of the database caller which is where you will control the outcome of that fake call.
A simple form of IoC can be performed just by coding to Interfaces. This requires some kind of object orientation going on in your code so it may not apply to what your doing (I say that since all I have to go on is your mention of PHP and Python)
Hope that's helpful, if nothing else you've got some terms to search on now.
Setting up test data for unit tests can be a challenge.
When it comes to Java, if you use Spring APIs for unit testing, you can control the transactions on a unit level. In other words, you can execute unit tests which involves database updates/inserts/deletes and rollback the changes. At the end of the execution you leave everything in the database as it was before you started the execution. To me, it is as good as it can get.

Resources