I am creating a silverlight pivot collection with 31K items (and images), however when I'm using the DeepZoomTools library to create the deep zoom images; it takes hours and hours (and hasn't actually completed even one).
Is there a multi-threaded way or distributed way in which collections could be created?
It is a time intensive process to be sure. Does your individual data points change often? What we have found in nearly all of our projects that the image for an individual item almost never changes. This allows you to streamline the process a little bit.
What I do in a case like this is to initially process the entire dataset. Then the next time I run the process, I only update the images that have been added or modified. As I said, in almost all of my cases this solved the problem you are running into. In fact, when it works, I will plug my card generation into whatever business applications that are running and generate/modify a card when data is added/changed in the system. This removes the need for batch processing altogether after your initial build.
If that will not work for you, take a look at the code for PAuthor. It is using DeepZoomTools and does so in a multi-threaded way. You should be able to find the code you are looking for there. PAuthor - CodePlex
Let me know if you have more specifics about your specific needs and we can see if we can come up with something.
Related
I'm currently developing some sort of I/O-pipeline system. Simply said: You can run simultaneous workers which do some stuff, either import or export data. I don't want to limit the user in how many workers will run simultaneously as performance, logically, depends on how they work.
If one wants to import 30 small images simultaneously, then he shall do this. But I want to equip my application with a monitor which notices when the application, especially the main thread, runs to slow, so that it will lag visibly. If this happens, the pipeline should reduce the amount of workers and maybe pause some of them in order so stabilize the application.
Is there any way to do this? How can I effectively monitor the speed, so I can say it is definetly to slow?
EDIT:
Okay I may have been a bit unclear. Sorry for that. The problem is more the status invokes. The user should be able to see what's going on. Therefore every worker invokes status updates. This happens in real-time. As you can imagine, this causes massive lags as there are a few hundred invokes per second. I tackled this problem by adding a lock which filters reports so that just every 50ms a report gets really invoked. The problem is: It still causes lags when there are about 20-30 workers active. I thought the solution was: Adjusting the time-lock by the current CPU load.
But I want to equip my application with a monitor which notices when the application, especially the main thread, runs to slow, so that it will lag visibly.
I don't quote understand this. I believe there are two common cases of UI lag:
Application that runs significant amounts of CPU-bound code on the UI thread or that performs synchronous IO operations there.
Showing way too many items in your UI, like a data grid with many thousand rows. Such UI is not useful to the user, so you should figure out a better way of presenting all those items.
In both cases, the lag can be avoided and you don't need any monitor for that, you need to redesign your application.
I have an asp.net-mvc website and people manage a list of projects. Based on some algorithm, I can tell if a project is out of date. When a user logs in, i want it to show the number of stale projects (similar to when i see a number of updates in the inbox).
The algorithm to calculate stale projects is kind of slow so if everytime a user logs in, i have to:
Run a query for all project where they are the owner
Run the IsStale() algorithm
Display the count where IsStale = true
My guess is that will be real slow. Also, on everything project write, i would have to recalculate the above to see if changed.
Another idea i had was to create a table and run a job everything minutes to calculate stale projects and store the latest count in this metrics table. Then just query that when users log in. The issue there is I still have to keep that table in sync and if it only recalcs once every minute, if people update projects, it won't change the value until after a minute.
Any idea for a fast, scalable way to support this inbox concept to alert users of number of items to review ??
The first step is always proper requirement analysis. Let's assume I'm a Project Manager. I log in to the system and it displays my only project as on time. A developer comes to my office an tells me there is a delay in his activity. I select the developer's activity and change its duration. The system still displays my project as on time, so I happily leave work.
How do you think I would feel if I receive a phone call at 3:00 AM from the client asking me for an explanation of why the project is no longer on time? Obviously, quite surprised, because the system didn't warn me in any way. Why did that happen? Because I had to wait 30 seconds (why not only 1 second?) for the next run of a scheduled job to update the project status.
That just can't be a solution. A warning must be sent immediately to the user, even if it takes 30 seconds to run the IsStale() process. Show the user a loading... image or anything else, but make sure the user has accurate data.
Now, regarding the implementation, nothing can be done to run away from the previous issue: you will have to run that process when something that affects some due date changes. However, what you can do is not unnecessarily run that process. For example, you mentioned that you could run it whenever the user logs in. What if 2 or more users log in and see the same project and don't change anything? It would be unnecessary to run the process twice.
Whatsmore, if you make sure the process is run when the user updates the project, you won't need to run the process at any other time. In conclusion, this schema has the following advantages and disadvantages compared to the "polling" solution:
Advantages
No scheduled job
No unneeded process runs (this is arguable because you could set a dirty flag on the project and only run it if it is true)
No unneeded queries of the dirty value
The user will always be informed of the current and real state of the project (which is by far, the most important item to address in any solution provided)
Disadvantages
If a user updates a project and then upates it again in a matter of seconds the process would be run twice (in the polling schema the process might not even be run once in that period, depending on the frequency it has been scheduled)
The user who updates the project will have to wait for the process to finish
Changing to how you implement the notification system in a similar way to StackOverflow, that's quite a different question. I guess you have a many-to-many relationship with users and projects. The simplest solution would be adding a single attribute to the relationship between those entities (the middle table):
Cardinalities: A user has many projects. A project has many users
That way when you run the process you should update each user's Has_pending_notifications with the new result. For example, if a user updates a project and it is no longer on time then you should set to true all users Has_pending_notifications field so that they're aware of the situation. Similarly, set it to false when the project is on time (I understand you just want to make sure the notifications are displayed when the project is no longer on time).
Taking StackOverflow's example, when a user reads a notification you should set the flag to false. Make sure you don't use timestamps to guess if a user has read a notification: logging in doesn't mean reading notifications.
Finally, if the notification itself is complex enough, you can move it away from the relationship between users and projects and go for something like this:
Cardinalities: A user has many projects. A project has many users. A user has many notifications. A notifications has one user. A project has many notifications. A notification has one project.
I hope something I've said has made sense, or give you some other better idea :)
You can do as follows:
To each user record add a datetime field sayng the last time the slow computation was done. Call it LastDate.
To each project add a boolean to say if it has to be listed. Call it: Selected
When you run the Slow procedure set you update the Selected fileds
Now when the user logs if LastDate is enough close to now you use the results of the last slow computation and just take all project with Selected true. Otherwise yourun again the slow computation.
The above procedure is optimal, becuase it re-compute the slow procedure ONLY IF ACTUALLY NEEDED, while running a procedure at fixed intervals of time...has the risk of wasting time because maybe the user will neber use the result of a computation.
Make a field "stale".
Run a SQL statement that updates stale=1 with all records where stale=0 AND (that algorithm returns true).
Then run a SQL statement that selects all records where stale=1.
The reason this will work fast is because SQL parsers, like PHP, shouldn't do the second half of the AND statement if the first half returns true, making it a very fast run through the whole list, checking all the records, trying to make them stale IF NOT already stale. If it's already stale, the algorithm won't be executed, saving you time. If it's not, the algorithm will be run to see if it's become stale, and then stale will be set to 1.
The second query then just returns all the stale records where stale=1.
You can do this:
In the database change the timestamp every time a project is accessed by the user.
When the user logs in, pull all their projects. Check the timestamp and compare it with with today's date, if it's older than n-days, add it to the stale list. I don't believe that comparing dates will result in any slow logic.
I think the fundamental questions need to be resolved before you think about databases and code. The primary of these is: "Why is IsStale() slow?"
From comments elsewhere it is clear that the concept that this is slow is non-negotiable. Is this computation out of your hands? Are the results resistant to caching? What level of change triggers the re-computation.
Having written scheduling systems in the past, there are two types of changes: those that can happen within the slack and those that cause cascading schedule changes. Likewise, there are two types of rebuilds: total and local. Total rebuilds are obvious; local rebuilds try to minimize "damage" to other scheduled resources.
Here is the crux of the matter: if you have total rebuild on every update, you could be looking at 30 minute lags from the time of the change to the time that the schedule is stable. (I'm basing this on my experience with an ERP system's rebuild time with a very complex workload).
If the reality of your system is that such tasks take 30 minutes, having a design goal of instant gratification for your users is contrary to the ground truth of the matter. However, you may be able to detect schedule inconsistency far faster than the rebuild. In that case you could show the user "schedule has been overrun, recomputing new end times" or something similar... but I suspect that if you have a lot of schedule changes being entered by different users at the same time the system would degrade into one continuous display of that notice. However, you at least gain the advantage that you could batch changes happening over a period of time for the next rebuild.
It is for this reason that most of the scheduling problems I have seen don't actually do real time re-computations. In the context of the ERP situation there is a schedule master who is responsible for the scheduling of the shop floor and any changes get funneled through them. The "master" schedule was regenerated prior to each shift (shifts were 12 hours, so twice a day) and during the shift delays were worked in via "local" modifications that did not shuffle the master schedule until the next 12 hour block.
In a much simpler situation (software design) the schedule was updated once a day in response to the day's progress reporting. Bad news was delivered during the next morning's scrum, along with the updated schedule.
Making a long story short, I'm thinking that perhaps this is an "unask the question" moment, where the assumption needs to be challenged. If the re-computation is large enough that continuous updates are impractical, then aligning expectations with reality is in order. Either the algorithm needs work (optimizing for local changes), the hardware farm needs expansion or the timing of expectations of "truth" needs to be recalibrated.
A more refined answer would frankly require more details than "just assume an expensive process" because the proper points of attack on that process are impossible to know.
i'm using C++ managed 2010 for designing a GUI in a form.h file. The GUI acts as a master querying data streaming from slave card.
Pressing a button a function (in the ApplicationIO.cpp file) is called in which 2 threads are created by using API win32 (CREATETHREAD(...)): the former is for handling data streaming, and the latter is for data parsing and data monitoring on real time grpah on the GUI.
The project has two different behaviour: if it starts in debugging mode it is able to update GUI controls as textbox (using invoke) and graph during data straming, contrariwise when it starts without debugging no data appears in the textbox, and data are shown very slowly on the chart.
has anyone ever addressed a similar problem? Any suggestion, please?
A pretty classic mistake is to use Control::Begin/Invoke() too often. You'll flood the UI thread with delegate invoke requests. UI updates tend to be expensive, you can easily get into a state where the message loop doesn't get around to doing its low-priority duties. Like painting. This happens easily, invoking more than a thousand times per second is the danger zone, depending on how much time is spent by the delegate targets.
You solve this by sending updates at a realistic rate, one that takes advantage of the ability of the human eye to distinguish them. At 25 times per second, the updates turn into a blur, updating it any faster is just a waste of cpu cycles. Which leaves lots of time for the UI thread to do what it needs to do.
This might still not be sufficiently slow when the updates are expensive. At which point you'll need to skip updates or throttle the worker thread. Note that Invoke() automatically throttles, BeginInvoke() doesn't.
I am "slowly" moving into Silverlight from asp.net and have a question about how to deal with situation where some code needs to be executed after web service calls have completed. For example, when user clicks on the row in the data grid a dialog box is shown that allows editing of the record. It contains numerous combo boxes, check boxes etc. So I need to first load data for each of the combo boxes, and than when all finished loading, I need to set the bound entity. Since I am new to this async thing, I was thinking to have some kind of counter that will keep track on how many calls have been dispatched, and as they finish reduce them by one, until it is zero, at which point I could raise an event that load has finished, and I could proceed with what ever is dependent on this. But this seems very clunky way of doing it. I am sure many have faced this issue, so how do you do this. If it helps, we use Prism with MVVM approach and Ria Services with Dtos.
What you've described is pretty much the way to go. There may be more elegant things you can do with locks and mutexes, but your counter will work. It has the bonus that you can see how many operations are still "in progress" at any one time.
You could dispatch your events sequentially but that would defeat the whole purpose of asynchronous operations.
If you analysed what each part of your UI needs you might be able to do some operations before all of your async events have finished. Making sure you start the longest running operations first might help - but there's no guarantee that the other shorter operations will finish first. It all depends on what resources are available on both the client and server at the time the call is made.
howzit!
I'm a web developer that has been recently requested to develop a Windows forms application, so please bear with me (or dont laugh!) if my question is a bit elementary.
After many sessions with my client, we eventually decided on an interface that contains a tabcontrol with 5 tabs. Each tab has a datagridview that may eventually hold up to 25,000 rows of data (with about 6 columns each). I have successfully managed to bind the grids when the tab page is loaded and it works fine for a few records, but the UI freezes when I bound the grid with 20,000 dummy records. The "freeze" occurs when I click on the tab itself, and the UI only frees up (and the tab page is rendered) once the bind is complete.
I communicated this to the client and mentioned the option of paging for each grid, but she is adament w.r.t. NOT wanting this. My only option then is to look for some asynchronous method of doing this in the background. I don't know much about threading in windows forms, but I know that I can use the BackgroundWorker control to achieve this. My only issue after reading up a bit on it is that it is ideally used for "long-running" tasks and I/O operations.
My questions:
How does one determine a long-running task?
How does one NOT MISUSE the BackgroundWorker control, ie. is there a general guideline to follow when using this? (I understand that opening/spawning multiple threads may be undesirable in certain instances)
Most importantly: How can I achieve (asychronously) binding of the datagridview after the tab page - and all its child controls - loads.
Thank you for reading this (ahem) lengthy query, and I highly appreciate any responses/thoughts/directions on this matter!
Cheers!
There's no hard and fast rule for determining a long-running task. It's something you have to know as a developer. You have to understand the nature of your data and your architecture. For example, if you expect to fetch some info from a desktop database with a single user from a table that contains a couple dozen rows you might not even bother showing a wait cursor. But if you're fetching hundreds of rows of data across a network to a shared database sever then you'd better expect that it will potentially be a long-running task to be handled not simply with a wait cursor but a thread that frees up your UI for the duration of the fetch. (You're definitely on the right track here.)
BackgroundWorker is a quick and dirty way of handling threading in forms. In your case, it will very much tie the fetching of data to the user interface. It is doable, works fine but certainly is not considered "best practice" for threading, OOP, separation of concerns etc. And if you're worried about abusing the alocation of threads you might want to read up on the ThreadPool.
Here's a nice example of using asynchronous threading with the thread pool. To do data binding, you fetch your data in the thread and when you get your callback, simply assign the result set to the the grid view's datasource property.