I have 200 groups. Each group has 100 devices, i.e. a total of 20000 devices divided into 200 groups of 100 each.
Now when each device gets registered with the server, the server assigns a group id to the device. (100 devices has same group id.) At a later stage the server sends the multicast data with the group id so that the data is received to all the devices having that group id.
The problem is that I need to allocate a single chunk of memory(say 25bytes) for each group to store the data so that all the devices in that group will use that chunk for their processing. My idea is to allocate a big chunk (say 25 * 200 = 5000 bytes) and assign each group a 25 byte block (grp0 points to start address, grp1 points to start+25 address and so on).
Is this the best way? Any other ideas?
For your example, I would use an array.
Provided the number of your clients does not change, allocating a single block is the most efficient way:
You do a single malloc call instead of 100
you avoid the overhead associate with the list that will track every memory block allocation
your data is kept in one piece, which it makes it more easily cacheable by the processor cache, compared to 100 small blocks placed god-knows-where
Said that, probably the difference with just 100 elements is negligible, but multiplied by 200 groups can give you a performace boost (really depend on how you are using the data structure)
In case of a dynamic structure instead (for example, your clients connect and disconnect so they are not always 100) you should use a linked list - which allocates the memory when needed (so you end up with 100 different memory blocks)
As stated by ArjunShankar, you will take O(1) time to ACCESS a device within a group, that's not bad assuming you don't have to process too much to find a specific device (assuming you have to find it). If you're planning to process them simultaneously and the number grows large (or your available memory is limited), you should take a look at some techniques such as disk pagination.
Related
My question is not on the query language but on the physical distribution of data in a graph database.
Let's assume a simple user/friendship model. In RDBs you would create a table storing IDUserA/IDUserB for a representation of a friendship.
If we assume a bunch of IT-Girls for example with the Facebook limit of 5k friends, we quickly get to huge amounts of data. If GirlA(ID 1) simply likes GirlB(ID 2). It would be an entry wir [1][2] in the table.
With this model it is not possible to get over data redundancy in friendship, because then we have to do either two queries (is there an entry in IDUserA or an entry in IDUserB with ID = 1, what means physically searching both columns) or to store [1][2] and [2][1], what ends up in data redundancy. For a heavy user this means checks against 5000/10000 entries containing an indexed column, which is astronomically big.
So ok, use GraphDBs. We assume the Girls as Nodes. GirlA is the first one ever entered into the DB, so her ID is simply 0. The Entry contains a isUsed - flag for the data chunk of a byte, and is 1 if it is in use. The next 4 bytes are a flag for the filename where her node is stored in (what leads to nearly 4.3 Billion possible files and we assume the file size of 16.7MB so we could use 3 more bytes to declare the offset inside.
Lets assume we define the username datatype as a chunk of 256 (and be for the example so ridgid).
For GirlA it is [1]0.0.0.0-0.0.0
= Her User ID 0 times 256 = 0
For GirlB it is [1]0.0.0.0-0.1.0
= Her User ID 1 times 256 = 256,
so her Usernamedata starts on file 0_0_0_0.dat on offset 256 from start. We don't have to search for her data, we could simply calculate them. A User 100 would be stored in the same file on offset 25600 and so forth and so on. User 65537 would be stored in file 0_0_0_1.dat on offset 0. Loaded in RAM this is only a pointer and pretty fast.
So we could store with this method more nodes than humans ever lived.
BUT: How to find relationships? Ok, with edges. But how to store them? All in one "column" is stupid, because then we are back on relationship models. In a hashtable? Ok, we could store the 0_0_0_0.frds as a hashtable containing all friends of User0, kick off a new instance of a User-Class Object, add the Friends to a binary list or tree that could be found by the pointer cUser.pFriendlist and we would be done. But I think that I make a mistake.
Shouldn't GraphDatabases be something different than mathematical nodes connected with hash tables filled with edges?
The use of nodes and edges is clear, because it allows to connect everything with relationships of anything. But whats about the queries and their speed?
Keeping different edges in different type of files seems somekind of wrong, even if the accessibility is really fast on SSDs.
Sure, I could use a simple relational table to store a edgetype/dataending pair, but please help me: where do I get it wrong!
I am new to mainframe world and trying to work it up but unable to get one thing that how are extents allocated in data sets.
And please can someone explain it using an example or answer this question
Suppose there is a sequential data sets where primary and secondary are both allocated 1 track.
How many times can this data set request for extent ?
Is the extent allotted to both primary and secondary or only secondary?
And one last question
How does setting or not setting guaranteed space attribute in storage class effects the no of extents that can be requested ?
Thank You
Sequential Data Allocation
A sequential data set with primary and secondary of 1 track each will be able to have 16 extents if it is allocated with one volume
//stepname EXEC PGM=IEFBR14
//ddname DD DSN=dataset,
// DISP=(NEW,CATLG),
// UNIT=SYSALLDA,SPACE=(TRK,(1,1))
/*
The above will allocate a dataset that can be 16 tracks big if extend by being written too.
If you replace SYSALLDA with (SYSALLDA,2) it will be able to use 2 volumes so can be 32 tracks of size across 2 volumes
The number of volumes can be overridden by the DATACLASS which can be assigned to an SMS managed datasets
Guaranteed space
Guaranteed space allows you to specify the actual volumes that a dataset will be allocated on when the allocation is SMS controlled, normally SMS will pick the volumes based on the ACS routines
The below jcl will allocate dataset on volume VOL001 if storage class has the DCGSPAC attribute
//stepname EXEC PGM=IEFBR14
//ddname DD DSN=dataset,
// DISP=(NEW,CATLG),vol=ser=VOl001,
// STORCLAS=GSPACE,
// UNIT=SYSALLDA,SPACE=(TRK,(1,1))
/*
Normally the SMS routines are coded so that only specific users or jobs are allowed to use storage classes with Guaranteed Space
Explanation of Storage Class
My system is supposed to write a large amount of data into a DynamoDB table every day. These writes come in bursts, i.e. at certain times each day several different processes have to dump their output data into the same table. Speed of writing is not critical as long as all the daily data gets written before the next dump occurs. I need to figure out the right way of calculating the provisional capacity for my table.
So for simplicity let's assume that I have only one process writing data once a day and it has to write upto X items into the table (each item < 1KB). Is the capacity I would have to specify essentially equal to X / 24 / 3600 writes/second?
Thx
The provisioned capacity is in terms of writes/second. You need to make sure that you can handle the PEAK number of writes/second that you are going to expect, not the average over the day. So, if you have a single process that runs once a day and makes X number of writes, of Y size (in KB, rounded up), over Z number of seconds, your formula would be
capacity = (X * Y) / Z
So, say you had 100K writes over 100 seconds and each write < 1KB, you would need 1000 w/s capacity.
Note that in order to minimize provisioned write capacity needs, it is best to add data into the system on a more continuous basis, so as to reduce peaks in necessary read/write capacity.
Over the connections that most people in the USA have in their homes, what is the approximate length of time to send a list of 200,000 integers from a client's browser to an internet sever (say Google app engine)? Does it change much if the data is sent from an iPhone?
How does the length of time increase as the size of the integer list increases (say with a list of a million integers) ?
Context: I wasn't sure if I should write code to do some simple computations and sorting of such lists for the browser in javascript or for the server in python, so I wanted to explore this issue of how long it takes to send the output data from a browser to a server over the web in order to help me decide where (client's browser or app engine server) is the best place for such computations to be processed.
More Context:
Type of Integers: I am dealing with 2 lists of integers. One is a list of ids for the 200,000 objects whose integers look like {0,1,2,3,...,99,999}. The second list of 100,000 is just single digits {...,4,5,6,7,8,9,0,1,...} .
Type of Computations: From the browser a person will create her own custom index (or rankings) based changing the weights associated to about 10 variables referenced to the 100,000 objects. INDEX = w1*Var1 + w2*Var2 + ... wNVarN. So the computations refer to vector (array) multiplication to a scalar and addition of 2 vectors, as well as sorting the final INDEX variable vector of 100,000 values.
In a nutshell...
This is probably a bad idea,
in particular with/for mobile devices where, aside from the delay associated with transfer(s), limits and/or extra fees associated with monthly volumes exceeding various plans limits make this a lousy economical option...
A rough estimate (more info below) is that the one-way transmission takes between 0.7 and and 5 seconds.
There is a lot of variability in this estimate, due mainly to two factors
Network technology and plan
compression ratio which can be obtained for a 200k integers.
Since the network characteristics are more or less a given, the most significant improvement would come from the compression ratio. This in turn depends greatly on the statistic distribution of the 200,000 integers. For example, if most of them are smaller than say 65,000, it would be quite likely that the list would compress to about 25% of its original size (75% size reduction). The time estimates provided assumed only a 25 to 50% size reduction.
Another network consideration is the availability of binary mime extension (8 bits mime) which would avoid the 33% overhead of B64 for example.
Other considerations / idea:
This type of network usage for iPhone / mobile devices plans will not fare very well!!!
ATT will love you (maybe), your end-users will hate you at least the ones with plan limits, which many (most?) have.
Rather than sending one big list, you could split the list over 3 or 4 chunks, allowing the server-side sorting to take place [mostly] in parallel to the data transfer.
One gets better compression ratio for integers when they are [roughly] sorted, maybe you can have a first pass sorting of some kind client-side.
How do I figure? ...
1) Amount of data to transfer (one-way)
200,000 integers
= 800,000 bytes (assumes 4 bytes integers)
= 400,000 to 600,000 bytes compressed (you'll want to compress!)
= 533,000 to 800,000 bytes in B64 format for MIME encoding
2) Time to upload (varies greatly...)
Low-end home setup (ADSL) = 3 to 5 seconds
broadband (eg DOCSIS) = 0.7 to 1 second
iPhone = 0.7 to 5 seconds possibly worse;
possibly a bit better with high-end plan
3) Time to download (back from server, once list is sorted)
Assume same or slightly less than upload time.
With portable devices, the differential is more notable.
The question is unclear of what would have to be done with the resulting
(sorted) array; so I didn't worry to much about the "return trip".
==> Multiply by 2 (or 1.8) for a safe estimate of a round trip, or inquire
about specific network/technlogy.
By default, typically integers are stored in a 32-bit value, or 4 bytes. 200,000 integers would then be 800,000 bytes, or 781.25 kilobytes. It would depend on the client's upload speed, but at 640Kbps upload, that's about 10 seconds.
well that is 800000 bytes or 781.3 kb, or you could say the size of a normal jpeg photo. for broadband, that would be within seconds, and you could always consider compression (there are libraries for this)
the time increases linearly for data.
Since you're sending the data from JavaScript to the server, you'll be using a text representation. The size will depend a lot on the number of digits in each integer. Are talking about 200,000 two to three digit integers or six to eight integers? It also depends on if HTTP compression is enabled and if Safari on the iPhone supports it (I'm not sure).
The amount of time will be linear depending on the size. Typical upload speeds on an iPhone will vary a lot depending on if the user is on a business wifi, public wifi, home wifi, 3G, or Edge network.
If you're so dependent on performance perhaps this is more appropriate for a native app than an HTML app. Even if you don't do the calculations on the client, you can send/receive binary data and compress it which will reduce time.
I assume 100 bytes is too small and can slow down larger file transfers with all of the writes, but something like 1MB seems like it may be too much. Does anyone have any suggestions for an optimal chunk of bytes per write for sending data over a network?
To elaborate a bit more, I'm implementing something that sends data over a network connection and show the progress of that data being sent. I've noticed if I send large files at about 100 bytes each write, it is extremely slow but the progress bar works out very nicely. However, if I send at say 1M per write, it is much faster, but the progress bar doesn't work as nicely due to the larger chunks being sent.
No, there is no universal optimal byte size.
TCP packets are subject to fragmentation, and while it would be nice to assume that everything from here to your destination is true ethernet with huge packet sizes, the reality is even if you can get the packet sizes of all the individual networks one of your packets takes, each packet you send out may take a different path through the internet.
It's not a problem you can "solve" and there's no universal ideal size.
Feed the data to the OS and TCP/IP stack as quickly as you can, and it'll dynamically adapt the packet size to the network connections (you should see the code they use for this optimization - it's really, really interesting. At least on the better stacks.)
If you control all the networks and stacks being used and inbetween your clients/servers, though, then you can do some hand tuning. But generally even then you'd have to have a really good grasp of the network and the data your sending before I'd suggest you approach it.
-Adam
If you can, just let the IP stack handle it; most OSes have a lot of optimization already built in. Vista, for example, will dynamically alter various parameters to maximize throughput; second-guessing the algorithm is unlikely to be beneficial.
This is especially true in higher-order languages, far from the actual wire, like C#; there are enough layers between you and actual TCP/IP packets that I would expect your code to have relatively little impact on throughput.
At worst, test various message sizes in various situations for yourself; few solutions are one-size-fits-all.
If you are using TCP/IP over Ethernet, the maximum packet size is about 1500 bytes. If you try to send more than that at once, the data will be split up into multiple packets before being sent out on the wire. If the data in your application is already packetized, then you might want to choose a packet size of just under 1500 so that when you send a full packet, the underlying stack doesn't have to break it up. For example, if every send you do is 1600 bytes, the TCP stack will have to send out two packets for each send, with the second packet being mostly empty. This is rather inefficient.
Having said that, I don't know much of a visible impact on performance this will have.
Make a function named CalcChunkSize
Add some private variables to your class:
Private PreferredTransferDuration As Integer = 1800 ' milliseconds, the timespan the class will attempt to achieve for each chunk, to give responsive feedback on the progress bar.
Private ChunkSizeSampleInterval As Integer = 15 ' interval to update the chunk size, used in conjunction with AutoSetChunkSize.
Private ChunkSize As Integer = 16 * 1024 ' 16k by default
Private StartTime As DateTime
Private MaxRequestLength As Long = 4096 ' default, this is updated so that the transfer class knows how much the server will accept
Before every download of a chunk, check if its time to calculate new chunksize using the ChunkSizeSampleInterval
Dim currentIntervalMod As Integer = numIterations Mod Me.ChunkSizeSampleInterval
If currentIntervalMod = 0 Then
Me.StartTime = DateTime.Now
ElseIf currentIntervalMod = 1 Then
Me.CalcChunkSize()
End If
numIterations is set to 0 outside the download-loop and after every downloaded chunk set to numIterations += 1
Have the CalcChunkSize doing this:
Protected Sub CalcAndSetChunkSize()
' chunk size calculation is defined as follows
' * in the examples below, the preferred transfer time is 1500ms, taking one sample.
' *
' * Example 1 Example 2
' * Initial size = 16384 bytes (16k) 16384
' * Transfer time for 1 chunk = 800ms 2000 ms
' * Average throughput / ms = 16384b / 800ms = 20.48 b/ms 16384 / 2000 = 8.192 b/ms
' * How many bytes in 1500ms? = 20.48 * 1500 = 30720 bytes 8.192 * 1500 = 12228 bytes
' * New chunksize = 30720 bytes (speed up) 12228 bytes (slow down from original chunk size)
'
Dim transferTime As Double = DateTime.Now.Subtract(Me.StartTime).TotalMilliseconds
Dim averageBytesPerMilliSec As Double = Me.ChunkSize / transferTime
Dim preferredChunkSize As Double = averageBytesPerMilliSec * Me.PreferredTransferDuration
Me.ChunkSize = CInt(Math.Min(Me.MaxRequestLength, Math.Max(4 * 1024, preferredChunkSize)))
' set the chunk size so that it takes 1500ms per chunk (estimate), not less than 4Kb and not greater than 4mb // (note 4096Kb sometimes causes problems, probably due to the IIS max request size limit, choosing a slightly smaller max size of 4 million bytes seems to work nicely)
End Sub
Then just use the ChunkSize when requesting next chunk.
I found this in the "Sending files in chunks with MTOM web services and .Net 2.0" by Tim_mackey and have found it very useful myself to dynamically calculate most effective chunksize.
The source code in whole are here: http://www.codeproject.com/KB/XML/MTOMWebServices.aspx
And author here: http://www.codeproject.com/script/Membership/Profiles.aspx?mid=321767
I believe your problem is that you use blocking sockets and not non-blocking ones.
When you use blocking sockets and you send 1M of data the network stack can wait for all of the data to be placed in a buffer, if the buffers are full you'll be blocked and your progress bar will wait for the whole 1M to be accepted into the buffers, this may take a while and your progress bar will be jumpy.
If however you use non-blocking sockets, whatever buffer size you use will not block, and you will need to do the waiting yourself with select/poll/epoll/whatever-works-on-your-platform (select is the most portable though). In this way your progress bar will be updated fast and reflect the most accurate information.
Do note that at the sender the progress bar is partially broken any way since the kernel will buffer some data and you will reach 100% before the other side really received the data. The only way around this is if your protocol includes a reply on the amount of data received by the receiver.
As others said, second guessing the OS and the network is mostly futile, if you keep using blocking sockets pick a size that is large enough to include more data than a single packet so that you don't send too little data in a packet as this will reduce your throughput needlessly. I'd go with something like 4K to include at least two packets at a time.
The one thing I will add is that for a given ethernet connection it takes about as long to send a small packet as a large one. As other's have said: if you're just sending a stream of data let the system handle it. But if you're worried about individual short messages back and forth, a typical ethernet packet is about 1500 bytes- as long as you keep it under that you should be good.
You'd need to use Path MTU Discovery, or use a good default value (ie less than 1500 bytes).
One empirical test you can do, if you haven't already, is of course to use a sniffer (tcpdump, Wireshark etc.) and look at what packet sizes are achieved when using other software for up/downloading. That might give you a hint.
Here is the formula you need:
int optimalChunkSize = totalDataSize / progressBar1.Width;
Using this, each chunk you send will increment the progress bar by 1 pixel. A smaller chunk size than this is pointless, in terms of user feedback.