Watch items that being added to the sql table, wpf - wpf

I have application that generates a lot of numbers and writes them to table in sql Database.
When the process writes the numbers to the table , I give an option to watch the numbers that have been already written to the DB. I want to do it "Live".
The thing is that I have a DLL that handles the DB management , and in my UI I use this DLL. So I cant bind the ListBox to the table of numbers because the UI doesn't "know" that table.... What could be the best solution for that?

Well there are 3 solutions that come to my mind:
poll periodically the DB from the user facing app. So there is a thread running which every x seconds picks up the values from the DB, and then transfers them to the client.
You can use the Sql Notification Services in case you are using sql server to push the changes to the client, then you don't have to poll.
the last option, I guess will be an overkill for you, but the producing "app" can notify the displaying app via e.g. a message queue, or a WCF call.
Imho No1 fits for you best, but I cannot judge more precisly because I don't know expected load, ....

Related

Processing a million records as a batch in BizTalk

I am looking at suggestions on how to tackle this and whether I am using the right tool for the job. I work primarily on BizTalk and we are currently using BizTalk 2013 R2 with SQL 2014.
Problem:
We would be receiving positional flat files every day(around 50) from various partners and the theoretical total number of records received would be over a million records. Each record has some identifying information that will need to be sent to a web service which would come back essentially with a YES or NO based on which the incoming file is split into two files.
Originally, the scope for daily expected records was 10k which later ballooned to 100k and now is at a million records.
Attempt 1: Scatter-Gather pattern
I am debatching the records in a custom pipeline using the file disassembler, adding a couple of port configurable properties for the scatter part(following Richard Seroter's suggestion of implementing a round-robin assignment) where I control the number of scatter/worker orchestrations I spin up to call the web service and mark the records to be sent to 'Agency A' or 'Agency B' and finally push a control message that spins up the Gather/Aggregator orchestration that collects all the messages that are processed from the workers into the messagebox via correlation and creates two files to be routed to Agency A and Agency B.
So, every file that gets dropped will have it's own set of workers and a aggregator that would process the file.
This works well for files with fewer number of records but if a file has over 100k records, I see throttling happen and the file takes a long time to process and generate the two files.
I have put the receive location/worker & aggregator/send port on separate hosts.
It appears to be that the gatherer seems to be dehydrated and not really aggregating the records processed by the workers until all of them are processed and i think since the ratio of msgs published vs processed is very large, it is throttling.
Approach 2:
Assuming that the Aggregator orchestration is the bottleneck, instead of accumulating them in an orchestration, i pushed the processed records to a SQL db and 'split' the records into two XML files(basically a concatenate of msgs going to Agency A/B and wrapping it in XML declaration and using the correct msg type based on writing some of the context properties to the SQL table along with the record).
These aggregated XML records are polled and routed to the right agencies.
This seems to work okay with 100k records and completes in an acceptable amount of time. Now that the goal post/requirement has again changed with regard to expected volume, i am trying to see if BizTalk is even a feasible choice anymore.
I have indicated that BT is not the right tool for the job to perform such a task but the client is suggesting we add more servers to make it work. I am looking at SSIS.
Meanwhile, while doing some testing, some observations:
Increasing the number of workers improved processing(duh):
It looks like if each worker processed a fewer number of records in it's queue/subscription, they finished their queue quickly. When testing this 100k record file, using 100 workers completed in under 3 hrs. This is with minimal activity on the server from other applications.
I am trying to get the web service hosting team to give me a theoretical maximum no of concurrent connection they can handle. I am leaning towards asking them to see if they can handle 1000 calls and maybe the existing solution would scale with my observations.
I have adjusted a few settings for the host with regard to message count and physical memory threshold so it won't balk with the volume but I am still unsure. I didn't have to mess with these settings before and can use advice to monitor any particular counters.
The post is a bit long but I am hoping this gives an idea on what I did so far. Any help/insight appreciated in tackling this problem. If you are suggesting alternatives, i am restricted to .NET or MS based tools/frameworks but would love to hear on other options as well.
I will try to answer or give more detail if you want to clarify or understand something I didn't make clear.
First, 1 million records/messages is not the issue, but you can make it a problem by handling it poorly.
Here's the pattern I would lay out first.
Load the records into SQL Server with SSIS. This will be very fast.
Process/drain the records into you BizTalk app for...well, whatever needs to be done. Calling the service etc.
Update the SQL Record with the result.
When that process is complete, query out the Yes and No batches as one (large) message each, transform and send.
My guess is the Web Service will be the bottleneck unless it's specifically designed for such a load. You will probably have to tune BizTalk to throttle only when necessary but don't worry about that just yet. A good app pattern is more important.
In such scenarios, you should consider following approach:
De-batch the file and store individual records to MSMQ. You can easily achieve this without any extra coding effort, all you need is to create a send port using MSMQ adapter or WCF custom with netmsmq binding. If required, you can also create separate queues depending on different criteria you may have in your messages.
Receive the messages from MSMQ using receive location on a separate host.
Send them to web service on a different BizTalk host.
Try using messaging only scenarios, you can handle service response using a pipeline component if required. You can use Map on send port itself. In worst case if you need orchestration, it should only be to handle one message processing without any complex pattern.
You can again push messages back to two MSMQ for two different agencies based of web service response.
You can then receive those messages again and write them to file, you can simply use a send port with FileAppend option or use a custom pipeline component to write the received messages to file without aggregating them in orchestration. You can gather them in orchestration, if per file you don't have more than few thousand messages.
With this approach you won't have any bottleneck within BizTalk and you don't need to use complex orchestration pattern which usually end up having many persistent points.
If web service becomes a bottleneck, then you can control the rate of received message from MSMQ using 1) Ordered Delivery on MSMQ receive location and if required 2) using BizTalk host throttling by changing two properties Message Count in Db to a very low number e.g. 1000 from 50K default and increasing Spool and Tracking Data Multiplier accordingly e.g. 500 from 10 default to make sure the multiply of both number is enough for not to cause throttling due to messages within BizTalk. You can also reduce the number of worker threads on BizTalk host to make it little slow.
Please note MSMQ is part of Windows OS and does not require any additional setup. Usually installed by default, if not you can add using add-remove features. You can also use IBM MQ if your organization has the infrastructure. But for one million messages, MSMQ will be just fine.
Apologies on the late update*
We've decided to use SSIS to bulk import the file to a table and since the lookup web service is part of the same organization and network although using a different stack, they have agreed to allow us to call their lookup table upon which their web service is based on and we are using a 'merge' between those tables to identify 'Y' or 'N' and export them out via SSIS as well.
In short, we've skipped using BT. The time it now takes is within a couple of mins for a 1.5 million record file to be processed and send the split files.
Appreciate all the advice provided here.

Reliable asynchronous processing in SQL Server

Some of the services are provided to our customers by a 3rd party. The data which is created on their remote servers is replicated to an on-premises SQL server.
I need to perform some work on that 3rd party server which database is not directly accessible to me.They expose a set of APIs for that purpose. The work is performed on a linked SQL server by a SQL Server Agent job.
Business scenario : customers can receive "badges" .A badge can be given to a customer by calling the UpdateCustomerBadgeInfo web method on a 3rd party server.
So a typical requirement for an automated task would look like this:
"Find all customers who logged in more than 50 times during theday, give them the [has-no-life] badge and send them an SMS notification"
The algorithm would be:
- Select all the matching accounts into a #TempTable
for each customer record:
- Call UpdateCustomerBadgeInfo() method (via CLR)
- If successfully updated badge info-> Enqueue SMS message (queue table)
- Log successful actions (so that the record will not be picked up next time)
The biggest problem with the way it works now is that it takes a lot of time to process large datasets in a WHILE loop.
So the 3rd party provider created a solution to perform batch updates of the customer data. They created a table on the on-premises SQL server to which batch update requests are submitted and later picked up by their service for validation and processing.
The question is :
How the above algorithm should be changed to fit into this asynchronous model?
This answer is valid only if I understood the situation correctly:
3rd party server used to expose web method to update customers one by one
now they expect to get this information from SQL Server table available to you for INSERT/UPDATE/DELETE
you can just stuff your customer-related requests into this table and they will be processed some time later
when the customer-related info gets updated, you have to perform some additional local actions (queue SMS, log activity)
Generally, I don't see any significant changes to the algorithm, but I will try to explain what would I do in this case.
Select all the matching accounts into a #TempTable
This may not be neccessary because you already have the table to stuff your requests in - 3rd party table. Only problem would be synchronizing requests, but for this to analyze you have to provide more details (multiple requests for the same customer allowed? protection of re-issuing the same request?)
for each customer record...
This should be the only change in your implementation. It now has the meaning - for each customer record that is asynchronously processed on 3rd party side. Of course, your 3rd party must give you some clue that they really did process your customer requeset, or you have no idea what to work with. So, when they validate and proces the data, they can provide e.g. nullable columns 'success_time' and 'error_time' to leave you message what has been done and when. If there is success, you continue with processing. If not, you can probably do something about that as well.
But how to react when you get async information back (e.g. sucess_time IS NOT NULL)? Well, there are multiple ways to do that. Personally I try to avoid triggers because they can make your life complicated (their visibility sucks, can cause problems with replication, can cause problems with transactions...) I use them if I really need first-class immediate responsiveness. Another possibility is using async queues with custom activation, which means Service Broker. However, a lot of people avoid using SB technology - it's different than the rest of SQL server, it has its speciffics, debugging is not so easy as with plain old SQL statements etc etc. Aother possibility would be batch processing async responses on your side using agent job. Since you are already using a job, you should be fine with it. Basically, the table should act as a synchronization point - you fill your requests (INSERT), 3rd party processes them (SELECT). After requests get processed they mark them as such (UPDATE success_time or error_time) and at the end you process that response (SELECT) using your agent job task. And your processing includes SMS message and logging, maybe even DELETING from 3rd party table.
Another thing to mention is that you need synchronization methods here. First, don't do anything without transactions, or you may end up processing ghost responses and/or skipping valid waiting responses. Second, when you SELECT responses (rows that are procesed on 3rd party side), you could get some improvent using READPAST hint (skip what is locked). However, if you need to update/delete from your 3rd party table after processing response, you may use SELECT with UPDLOCK to block another side of temperig with the data between your INSERT and UPDATE. Or you don't use any locking hints if you are not completely sure what goes on with the table in question.
Hope it helps.

How efficient can Meteor be while sharing a huge collection among many clients?

Imagine the following case:
1,000 clients are connected to a Meteor page displaying the content of the "Somestuff" collection.
"Somestuff" is a collection holding 1,000 items.
Someone inserts a new item into the "Somestuff" collection
What will happen:
All Meteor.Collections on clients will be updated i.e. the insertion forwarded to all of them (which means one insertion message sent to 1,000 clients)
What is the cost in term of CPU for the server to determine which client needs to be updated?
Is it accurate that only the inserted value will be forwarded to the clients, and not the whole list?
How does this work in real life? Are there any benchmarks or experiments of such scale available?
The short answer is that only new data gets sent down the wire. Here's
how it works.
There are three important parts of the Meteor server that manage
subscriptions: the publish function, which defines the logic for what
data the subscription provides; the Mongo driver, which watches the
database for changes; and the merge box, which combines all of a
client's active subscriptions and sends them out over the network to the
client.
Publish functions
Each time a Meteor client subscribes to a collection, the server runs a
publish function. The publish function's job is to figure out the set
of documents that its client should have and send each document property
into the merge box. It runs once for each new subscribing client. You
can put any JavaScript you want in the publish function, such as
arbitrarily complex access control using this.userId. The publish
function sends data into the merge box by calling this.added, this.changed and
this.removed. See the
full publish documentation for
more details.
Most publish functions don't have to muck around with the low-level
added, changed and removed API, though. If a publish function returns a Mongo
cursor, the Meteor server automatically connects the output of the Mongo
driver (insert, update, and removed callbacks) to the input of the
merge box (this.added, this.changed and this.removed). It's pretty neat
that you can do all the permission checks up front in a publish function and
then directly connect the database driver to the merge box without any user
code in the way. And when autopublish is turned on, even this little bit is
hidden: the server automatically sets up a query for all documents in each
collection and pushes them into the merge box.
On the other hand, you aren't limited to publishing database queries.
For example, you can write a publish function that reads a GPS position
from a device inside a Meteor.setInterval, or polls a legacy REST API
from another web service. In those cases, you'd emit changes to the
merge box by calling the low-level added, changed and removed DDP API.
The Mongo driver
The Mongo driver's job is to watch the Mongo database for changes to
live queries. These queries run continuously and return updates as the
results change by calling added, removed, and changed callbacks.
Mongo is not a real time database. So the driver polls. It keeps an
in-memory copy of the last query result for each active live query. On
each polling cycle, it compares the new result with the previous saved
result, computing the minimum set of added, removed, and changed
events that describe the difference. If multiple callers register
callbacks for the same live query, the driver only watches one copy of
the query, calling each registered callback with the same result.
Each time the server updates a collection, the driver recalculates each
live query on that collection (Future versions of Meteor will expose a
scaling API for limiting which live queries recalculate on update.) The
driver also polls each live query on a 10 second timer to catch
out-of-band database updates that bypassed the Meteor server.
The merge box
The job of the merge box is to combine the results (added, changed and removed
calls) of all of a client's active publish functions into a single data
stream. There is one merge box for each connected client. It holds a
complete copy of the client's minimongo cache.
In your example with just a single subscription, the merge box is
essentially a pass-through. But a more complex app can have multiple
subscriptions which might overlap. If two subscriptions both set the
same attribute on the same document, the merge box decides which value
takes priority and only sends that to the client. We haven't exposed
the API for setting subscription priority yet. For now, priority is
determined by the order the client subscribes to data sets. The first
subscription a client makes has the highest priority, the second
subscription is next highest, and so on.
Because the merge box holds the client's state, it can send the minimum
amount of data to keep each client up to date, no matter what a publish
function feeds it.
What happens on an update
So now we've set the stage for your scenario.
We have 1,000 connected clients. Each is subscribed to the same live
Mongo query (Somestuff.find({})). Since the query is the same for each client, the driver is
only running one live query. There are 1,000 active merge boxes. And
each client's publish function registered an added, changed, and
removed on that live query that feeds into one of the merge boxes.
Nothing else is connected to the merge boxes.
First the Mongo driver. When one of the clients inserts a new document
into Somestuff, it triggers a recomputation. The Mongo driver reruns
the query for all documents in Somestuff, compares the result to the
previous result in memory, finds that there is one new document, and
calls each of the 1,000 registered insert callbacks.
Next, the publish functions. There's very little happening here: each
of the 1,000 insert callbacks pushes data into the merge box by
calling added.
Finally, each merge box checks these new attributes against its
in-memory copy of its client's cache. In each case, it finds that the
values aren't yet on the client and don't shadow an existing value. So
the merge box emits a DDP DATA message on the SockJS connection to its
client and updates its server-side in-memory copy.
Total CPU cost is the cost to diff one Mongo query, plus the cost of
1,000 merge boxes checking their clients' state and constructing a new
DDP message payload. The only data that flows over the wire is a single
JSON object sent to each of the 1,000 clients, corresponding to the new
document in the database, plus one RPC message to the server from the
client that made the original insert.
Optimizations
Here's what we definitely have planned.
More efficient Mongo driver. We
optimized the driver
in 0.5.1 to only run a single observer per distinct query.
Not every DB change should trigger a recomputation of a query. We
can make some automated improvements, but the best approach is an API
that lets the developer specify which queries need to rerun. For
example, it's obvious to a developer that inserting a message into
one chatroom should not invalidate a live query for the messages in a
second room.
The Mongo driver, publish function, and merge box don't need to run
in the same process, or even on the same machine. Some applications
run complex live queries and need more CPU to watch the database.
Others have only a few distinct queries (imagine a blog engine), but
possibly many connected clients -- these need more CPU for merge
boxes. Separating these components will let us scale each piece
independently.
Many databases support triggers that fire when a row is updated and
provide the old and new rows. With that feature, a database driver
could register a trigger instead of polling for changes.
From my experience, using many clients with while sharing a huge collection in Meteor is essentially unworkable, as of version 0.7.0.1. I'll try to explain why.
As described in the above post and also in https://github.com/meteor/meteor/issues/1821, the meteor server has to keep a copy of the published data for each client in the merge box. This is what allows the Meteor magic to happen, but also results in any large shared databases being repeatedly kept in the memory of the node process. Even when using a possible optimization for static collections such as in (Is there a way to tell meteor a collection is static (will never change)?), we experienced a huge problem with the CPU and Memory usage of the Node process.
In our case, we were publishing a collection of 15k documents to each client that was completely static. The problem is that copying these documents to a client's merge box (in memory) upon connection basically brought the Node process to 100% CPU for almost a second, and resulted in a large additional usage of memory. This is inherently unscalable, because any connecting client will bring the server to its knees (and simultaneous connections will block each other) and memory usage will go up linearly in the number of clients. In our case, each client caused an additional ~60MB of memory usage, even though the raw data transferred was only about 5MB.
In our case, because the collection was static, we solved this problem by sending all the documents as a .json file, which was gzipped by nginx, and loading them into an anonymous collection, resulting in only a ~1MB transfer of data with no additional CPU or memory in the node process and a much faster load time. All operations over this collection were done by using _ids from much smaller publications on the server, allowing for retaining most of the benefits of Meteor. This allowed the app to scale to many more clients. In addition, because our app is mostly read-only, we further improved the scalability by running multiple Meteor instances behind nginx with load balancing (though with a single Mongo), as each Node instance is single-threaded.
However, the issue of sharing large, writeable collections among multiple clients is an engineering problem that needs to be solved by Meteor. There is probably a better way than keeping a copy of everything for each client, but that requires some serious thought as a distributed systems problem. The current issues of massive CPU and memory usage just won't scale.
The experiment that you can use to answer this question:
Install a test meteor: meteor create --example todos
Run it under Webkit inspector (WKI).
Examine the contents of the XHR messages moving across the wire.
Observe that the entire collection is not moved across the wire.
For tips on how to use WKI check out this article. It's a little out of date, but mostly still valid, especially for this question.
This is still a year old now and therefore I think pre-"Meteor 1.0" knowledge, so things may have changed again? I'm still looking into this.
http://meteorhacks.com/does-meteor-scale.html
leads to a "How to scale Meteor?" article
http://meteorhacks.com/how-to-scale-meteor.html

SilverLight RIA Monitor Tool, need help with design

Ok here's (a simplification of) the situation, the server side has a list of connectionstrings for different DBs on different machines (values in relevant tables keep changing by other SW).
Uppon request from the client side, the server side checks the DBs one by one and has a logic that outputs a status string.
The client side should display a datagrid with the machine name and status string for all machines. The idea is that the monitor continually refresh to show any changes in status for any of the machines.
I've implemented a first draft with RIA services which works fine, I've used a DispatcherTimer to keep refreshing the ui.
My question is ,in this scenario, is it possible to get automatic update of the UI whenever any of the underlying DB's change using RIA bindings instead of actively initiating the queries from the client with DispatcherTimer ??
Any clues will be really appreciated !
Thanks
Micha
RIA is just a layer on top of WCF service calls. You still need to poll for data changes.
You can reduce the amount of data moved across by having a "lastchanged" value cached on the server side. You poll the lastChanged value first on a regular basis and then only decide to pull the data if that value has changed.
That does of course mean some extra work server-side to update that value when changes occur, but if it all changes come in via RIA services it is pretty easy to hook in.

WCF and SQL Server: changes in DB are displayed only after app restarts

I'm developing a client-server app using WCF and Linq2Sql. My server-side program exposes to the clent an interface that provides methods of reading from and writing to my SQL Server DB.
But when the client writes some date into DB, perhabs waites some time, and then tries to read that data from DB, it seems like no data has been written to DB, but if I restart my server-side app or perform DB detaching and reataching or restarting of sqlserver-service, then my client-side program can get that data from server-side program.
Does anyone have any idea what's wrong with my app (server?) and how to fix this?
UPDATE: I'm using Linq2Sql (calling CataContext.SubmitChanges()).
UPDATE 2: I've discovered, than if I add some new rows into my table, all is correct, but when I'm updating some pieces of row (some properties of objects) and then save changes, the changes become displayed only after reconnection to DB. It appears not to have flushed data immediatly after updating some properties and invocation of DataContext.SubmitChanges().
I don't have an answer, but some ideas for how to further track down the issue.
How do you write to the DB? Do you use transactions, that maybe remain open? Can you query
the updates in the database when they don't show up in your WCF response? Does your update maintain locks and somehow not release them? Did you eliminate caching as the cause?
Try remote-debugging to find out what happens on the server. A WCF trace might be helpful, too.

Resources