Two-way SQL Service Broker - sql-server

I'm trying to use SSB as full-duplex messaging infra for multiple distributed logical stations.
Several stations can reside on same process, or on different machines (doesn't matter).
The stations need to communicate and sync with each other by constantly sending messages back and forth.
The stations run as part of a Windows Service, so, a station's life-time is very long.
Each message a station sends may be designated to a single station or to multiple stations or to all stations (broadcast).
A message is relevant to a specific station only if it's designated to that specific station, or if it's a broadcast message.
All the SSB's Dialog/Converstation/Group terminology really got me confused.
I can't figure out how to determine who and when should become an Initiator/Target, because according to my case, each station can send a message whenever it needs to, and should receive relevant messages all the time.
Since many stations might send messages to many other stations, all at the same time, dequeuing time should be a quick as possible, and performance must be optimal.
According to Microsoft, I should use many conversations with many messages for optimal performance.
But I can't figure out when and how I should create a separate dialog/conversation, and when should a conversation end, if at all.
Can someone please shed some light on this, and give me a proper direction for my case?
Thank you.

Assuming that each station has an inventory of all of the others (which may take some hand holding!), this should be doable.
In regards to the initiator/target terminology, whomever starts the conversation is the initiator and the recipients of that message are targets. Imagine the following real world conversation. I'll prepend any messages sent by initiator with an [I] and those sent by the target with a [T]
[I] Hello department store. Tell me what kinds of socks you have for purchase.
[T] We sell red, white, and blue socks.
[I] I'll take two pairs of red socks.
[T] Okay
In that case, there was only one target, but a given conversation can be initiated with multiple targets in mind (in your "multiple stations" or "broadcast" scenarios). To further the above analogy, it'd be akin to calling multiple stores at once and asking them all for their sock inventory.
Intuiting why you'd need to know who is the initiator and target, the above conversation could be broken down into the following message types: InventoryRequest, InventoryResponse, PurchaceRequest, PurchaseResponse. If you bundle those all into one contract, you should be off to the races.

Related

is it better to define one topic with different types or to define typed-topic for each type?

is it better to define one topic with different types or to define typed-topic for each type?
For example,
[topic]
foo
[message]
{
type: 'x',
data: [1,2,3]
}
OR
[topic]
foo-x
[message]
{
data: [1,2,3]
}
According to pub/sub-pricing, it seems that latter case is a little bit cheaper because of the message size given that gross message sent and delivered is the same.
But this is only pricing, I think there may be some empirical design or schema to define message. What could be considered in this case of message?
Prefer the second solution. Indeed, if you want to plug processing (Function, Cloud Run, AppEngine or other compute platform) on a particular topic, you can do it because the topic is typed.
If the topic is not typed, you have to implement a filter in your process to do: "If my type is X then do else exit". And you are charged for this. For example, for function and Cloud Run, you are charged per request and by processing time, the minimal facturation in 100ms -> too much for a simple IF
However, if you want to plug the same function on several topics, you have to duplicate your function (except if you use HTTP trigger)
Hope this help!
There isn't one right or wrong answer here. In part, it depends on the nature of your subscribers. Are they going to be interested in messages of all types or in messages of only one type? If the former, then a single topic with the type in the message might be better. If the latter, then separate subscriptions might be better. Additionally, how many types do you have? For dozens or more types, managing individual topics may be a lot of overhead.
Price would likely not be a factor. If your messages are large, then the price will be dominated by the contents of the messages. If your messages are small, then they will be subject to the 1KB minimum billing volume per request. The only way the type would be a factor is if you have very small messages, but batch them such that they are at least 1KB.
If you do decide to encode the type in the message, consider putting it in the message attributes instead of the message data. That way, you can look at the type without having to decode the entire message, which can be helpful if you want to immediately ack messages of certain types instead of processing them.

Amazon MWS determine if product is in BuyBox

I am integrating MWS Amazon API. For importing product I need one important field like whether the seller product is buybox winner or not. I need to set flag in our DB.
I have check all the possible API of product in Amazon scratchpad but not get luck how to get this information.
The winner of the buy box can (and does) change very frequently, depending on the number of sellers for the product. The best way to get up-to-the-minute notifications of a product's buy box status is to subscribe to the AnyOfferChangedNotification: https://docs.developer.amazonservices.com/en_US/notifications/Notifications_AnyOfferChangedNotification.html
You can use those notifications to update your database. Another option is the Products API which has a GetLowestPricedOffersForASIN operation which will tell you if your ASIN is currently in the buy box. http://docs.developer.amazonservices.com/en_US/products/Products_GetLowestPricedOffersForASIN.html
Look for IsBuyBoxWinner.
While the question is old, it could still be useful to someone having a right answer about the product api solution.
In the product api there is GetLowestPricedOffersForSKU (slightly different from GetLowestPricedOffersForASIN ) which has, in addition to the information "IsBuyBoxWinner", the information "MyOffer". The two values combined can tell if you have the buybox.
Keep in mind that the api call limits for both are very strict (200 requests max per hours) so in the case of a very high number of offers the subscription to "AnyOfferChangedNotification" is the only real option. It requires further developing to consume those notifications though, so it is by no means simple to develop.
One thing to consider is that the AnyOfferChangedNotification is not a service that can push to the SQS queue that is a FIFO(First one first-out) style buffer. You can only push to a standard - random order sqs queue. I thought I was being smart when I set up two threads in my application, one to download the messages and one to process them. However when you download messages from download these messages you can get messages from anywhere in the SQS queue. To be successful you need to at least
Download all the messages to your local cache/buffer/db until amazon returns 'there are no more messages'
Run your process off of that local buffer that was built and current as the time you got the last 'no more message' returned from amazon
It is not clear from amazons documentation but I had a concern that I have not proven yet but worth looking in to. If an asin reprices two or three times quickly it is not clear if the messages could come in to queue out of order (or any one messages could be delayed). By 'out of order' I mean for one sku/asin it is not clear if you can get a message with a more recent 'Time of Offer Change' before one with an older 'Time of Offer Change' If so that could create a situation where 1)you have a ASIN that reprices at 12:00:00 and again at 12:00:01(Time of Offer Change time). 2) At 12:01:00 you poll the queue and the later 12:00:01 price change is there but not ther earlier one from 12:00:00. 3)You iterate the sqs queue until you clear it and then you do your thing(reprice or send messages or whatever). Then on the next pass you poll the queue again and you get this earlier AnyOfferChangeNotification. I added logic in my code to track the 'Time of Offer Change' for any asin/sku and alarm if it rolls backwards.
Other things to consider.
1)If you go out of stock on a ASIN/SKU you stop getting messages 2)You dont start getting messages on ASIN/SKU until you ship the item in for the first time, just adding it to FBA inventory is not enough. If you need pricing to update that earlier (or when you go out of stock) you also need to poll GetLowestPricedOffersForASIN

Sloot Digital Coding System

" In the late 1990s, a Dutch electronics technician named Romke Jan Berhnard Sloot announced the development of the Sloot Digital Coding System, a revolutionary advance in data transmission that, he claimed, could reduce a feature-length movie down to a filesize of just 8KB. The decoding algorithm was 370MB, and apparently Sloot demonstrated this to Philips execs, dazzling them by playing 16 movies at the same time from a 64KB chip. After getting a bunch of investors, he mysteriously died on September 11, 1999"
it's possible or just a story
There are two views on the story of the Sloot Digital Coding System. They are incompatible: In one view it is impossible, in the other it is possible.
What is impossible?
To store every possible movie down to a file size of just 8KB. This boils down to the Pigeonhole principle.
A key of a limited length (whether it is a kilobyte or a terabyte) can
only store a limited number of codes, and therefore can only
distinguish a finite number of movies. However, the actual number of
possible movies is infinite. For, suppose it were finite; in that case
there would be a movie that is the longest. By just adding one extra
image to the movie, I would have created a longer movie, which I
didn't have before. Ergo, the number of possible movies is infinite.
Ergo, any key of limited length cannot distinguish every possible
movie.
The SDCS is only possible if keys are allowed to become infinite, or
the data store is allowed to become infinite (if the data store
already contains all movies ever made, a key consisting of a number
can be used to select the movie you want to see -- however, in that
case it is impossible to have keys for movies that have not been made
yet at the time the data store was constructed). This would, of
course, make the idea useless.
Pieter Spronck
What is possible?
To store or load a finite amount of feature-length movies on a device and be able to unlock them with a 8KB key.
Then it is not so about compression, but encoding / databases / data transmission. This is a change in distribution model: Why ship software/data at a later time over telephone or DVD, when you can pre-store it during fabrication, or pipe it all at once at intervals. This model is pretty close to how phones come with pre-loaded apps, or how some games allow you to unlock new game elements by entering a key.
The Sloot patents never claim feature-length movie -> 8KB data compression. They claim an 8x compression rate.
It is not about compression. Everyone is mistaken about that. The principle can be compared with a concept as Adobe-postscript, where sender and receiver know what kind of data recipes can be transferred, without the data itself actually being sent.
- Roel Pieper
In this view SDCS is a primitive form of DRM, that would reduce the band-with of getting access to a certain piece of pre-stored data to an 8KB key.
Imagine storing that month's popular movies by bringing your device to your local video store. Then when you want to see an available movie, you just call for your key, or buy a chipcard at the gas station. Now we have enough band-width for streaming Netflix, but back in the late 90s we were on dial-up and there was a billion dollar data transmission industry (DVD's, CD's, Video tapes, floppies, hard disks).
Was playing 16 movies at once possible?
This is unverified. Though many investors claim to have seen the demonstration. These people worked for respected companies like Philips, Oracle, Endemol, 'Kleiner, Perkins, Caufield and Byers'. I'd say it is not impossible, but await more verification.
A very interesting concept. Conceptually, the Sloot encoding premise seems to be that the "receiver" would have a heavy data rich (DRM-Like) program, capable of a large pre-programmed capabilities ready and able to execute complex programming tasks with minimal data instruction.
I am not a programmer, however, at present, current data transfer challenges exist where there seems to be more focus on the "transmission" of the of data (dense and voluminous), versus the capability of the receiving program/hardware. Whereas with Sloot, the emphasis is on the pre-loading of such data (with hardware/software that has much higher capabilities built-in). I hope I'm not saying the obvious here.
As an example, using sound files for simplicity, rather than sending a complex sound file containing say an Mp3 of Vivaldi – The Four Seasons, the coding just instructs the receiver the "musical notes" of the composition, where the system is pre-programmed to play the notes. Obviously there is more to it than that, however, the concept makes perfect sense. In other words rather than transmitting "Vivaldi" data rich signal, send simpler instructions to a "Vivaldi" trained receiver. Don't send the composer, send the instructions to a composer already there.
Yes, movies can contain billions of instructional data under the current system (and that of 1999), however, can beefing up the abilities, the pre-progammed functions, of the receiver achieve what Sloot had figured out?
Currently, the data stream seems to be carrying the load, where instead the receiver should be, as suggested by Sloot. So, does it make more sense to send the music composer by train to the concert hall across the country, or to send the music notes to another composer who is already there? This not to be confused with pre-loaded movies being "unlocked", rather that the movie player has infinite abilities that simple coding can instruct within an order of magnitude greater ability.
Just some random thoughts from a layman.

How te create suggestions in database?

I want to develop a small musical library. My objective is to add an idea of suggestions for users :
A user adds musics into the application, he is not connected at all, it's anonymous.
When a user open or close the application, we send his library to our database, to collect (only) new music tracks information.
When a user click on suggestions, i want to check the database and to compare his library with the database. I want to find the music that users like him, who listen the same music as him, listen to.
My idea was to create a link between two musics who defined to percentage of users who got those two musics. If this percentage is high, we can suggest the second one to the users who listen the first one.
I need some help to find documentation about that type of database, without any user idea. I have to compare a user library with a big list of music. I've found that it's item-based recommendation. Am I in a good way ?
Whether a user listens to a particular song or has it in his/her library can be misleading. Lots of times, sample music will come with an operating system or music player and the user just doesn't care enough to remove it, or lots of times it can be hard for a machine to determine the difference between music and other sounds. Or maybe somebody has some music they downloaded because it seemed interesting on paper or came on an album that they liked as a whole, but they actually ended up not liking that song, but again didn't delete it.
One time I set Windows Media player to shuffle all the music on my computer, and to my surprise, I heard punch sound effects, music I had never heard before (from artists I had never heard of, in genres I didn't listen to), and even Windows click sounds that confused me as I wasn't clicking anything.
I say all that to point out that you might want to put more thought into it than which users appear to listen to the same music. Maybe you could have users rate the songs they listen to, and compare not only the songs in their libraries but their ratings of the songs. If two users have all the same songs but one user hates all the songs that the other likes and vice-versa, they really don't have similar tastes.
I would define a UDF that compares two users' tastes by taking each song user 1 has and ignoring it if user 2 doesn't, but subtracting the absolute value of the difference of their ratings from the maximum rating if it does, then adds all these values together.
Then I would run this UDF for each pair of one user to another and pick the top few, then suggest the songs that they have highly-rated.
This will take a long time, particularly if you have a large number of users, so what you can also do is make a Suggestors table that stores each user's most similar users, and update (that is, truncate and then rebuild) it via the above process daily, weekly, monthly, whatever fits your situation. The suggestions feature (when used by the user) would then only need to check the user's suggestors' high-rated songs, which would take substantially less time but would keep things fairly up to date with additions and changes to users' libraries.

Which is better: sending many small messages or fewer large ones?

I have an app whose messaging granularity could be written two ways - sending many small messages vs. (possibly far) fewer larger ones. Conceptually what moves around is a set of 'alive' vertex IDs that might get filtered at each superstep based on a processed list (vertex value) that vertexes manage. The ones that survive to the end are the lucky winners. compute() calculates a set of 'new-to-me' incoming IDs that are perfect for the outgoing message, but I could easily send each ID one at a time. My guess is that sending fewer messages is more important, but then each set might contain thousands of IDs. Thank you.
P.S. A side question: The few custom message type examples I've found are relatively simple objects with a few primitive instance variables, rather than collections. Is it nutty to send around a collection of IDs as a message?
I have used lists and even maps to be sent or just stored as vertex data, so that isn’t a problem. I think it shouldn’t matter for giraph which you want to choose, and I’d rather go with many simple small messages, as you will use Giraph appropriately. Instead you will need to go in the compute function through the list of messages and for each message through the list of IDs.
Performance-wise it shouldn’t make any difference. What I’ve rather found to make a big difference is, try to compute as much as possible in on cycle, as the switching between cycles and synchronising the messages, ... takes a lot of time. As long as that doesn’t change it should be more or less the same and probably much easier to read and maintain when you keep the size of messages small.
In order to answer your question, you need understand the MessageStore interface and its implementations.
In a nutshell, under the hood, it took the following steps:
The worker receive the byte raw input of the messages and the destination IDs
The worker sort the messages and put them into A Map of A Map. The first map's key is the partition ID, the section map's key is the vertex ID. (It is kind of like the post office. The work is like the center hub, and it sort the letters into different zip code first, then in each zip code sorted by address)
When it is the vertex's turn of compute, a Iterable of that vertex's messages are passed to the vertex's compute method, and that's where you get the messages and use it.
So less and bigger messages are better because of less sorting if the total amount of bytes is the same for both cases.
Also, you could send many small messages, but let Giraph convert this into a long one (almost) automatically. You can use Combiners.
The documentation on this subject is terrible on Giraph site, but you maybe could extract an example from the book Practical Graph Analytics with Apache Giraph.
This depends on the type of messages that you are sending, mainly.

Resources