Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I am using AWS dynamoDB.
option 1
[{"id":"01","avaliable":true},
{"id":"02","avaliable":true},
{"id":"03","avaliable":false},
{"id":"04","avaliable":true}
{"id":"05","avaliable":false}]
option 2
"avaliable":[true,true,false,true,false]
id will always in sequence and start with 0 so I think it is a waste to include "id" as attribute. I can just update avaliabe in option 2 using {id-1} as array index. But I am not sure will there be any other issue if I use option 2. I am orginally using option 1 and will check whether the id correct before update. I am afraid option 2 will have mistake easily.
Which structure do you think is better?
Personally I prefer to use Map type in DynamoDB because it allows you to update on a key versus guessing what index you need in an array. However that would be option 3:
"mymap":{
"id01":{"avaliable":true},
"id02":{"avaliable":true},
"id03":{"avaliable":true},
"id04":{"avaliable":true},
"id05":{"avaliable":true}
}
This allows you up modify elements without first trying to figure out what position in an array it might be, which sometimes requires you to first read the item and can cause concurrency issues.
I do notice you mention that you equate the position of the item in the array, however I feel this is a more fool-proof way for general implementation. For example, if you need to remove a value from the middle of the list, it would not cause any issues.
That is one thing that can influence your decision, the other 2 being item size and total storage.
If your item size is substantially less than 1KB then you will have no issue using option 1 or 3 which will increase your item size slightly compared to option 2. As long as the extra characters do not push your average item size over the nearest 1KB value as that will mean that you will have increased your capacity consumption for write requests.
The other being the total storage size. DynamoDB provides a free tier of 25GB of storage. If you have millions of items causing you to increase your storage size substantially, then you may decide to use option 2.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am working on algorithm where i can have any number of 16 bit values(For instance i have 1000 16 bit values , and all are sensor data, so no particular series or repetition). I want to stuff all of this data into an 8 or a 10 byte array(each and every value of the 1000 16 bits numbers should be inside the 10 byte array) . The information should be such that i can also easily decode to read each and every value from the 1000 values.
I have thought of using sin function by dividing the values by 100 so every data point would always be in 8 bits(0-1 sin value range) , but that only covers up small range of data and not huge number of values.
Pardon me if i am asking for too much. I am just curious if its possible or not.
The answer to this question is rather obvious with a little knowledge in information sciences. It is not possible to store that much information in so little memory, and the data you are talking about just contains too much information.
Some data, like repetitive data or data which is following some structure (like constantly rising values), contains very little information. The task of compression algorithms is to figure out the structure or repetition and instead of storing the pure data to store the structure or rule how to reproduce the data instead.
In your case, the data is coming from sensors and unless you are willing to lose a massive amount of information, you will not be able to generate a compressed version of it with a compression factor in the magnitude your are talking about (1000 × 2 bytes into 10 bytes). If your sensors more or less produce the same values all the time with just a little jitter, a good compression can be achieved (but for this your question is way to broad to be answered here) but it will probably never be in the range of reducing your 1000 values to 10 bytes.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Process of extracting data,
I am analyzing 4000 to 8000 DICOM files using matlab codes. DICOM files are read using dicomread() function. Each DICOM file contains 932*128 photon count data coming from 7 detectors. While reading DICOM files, I convert data into double and stored in 7 cell array variables (from seven detectors). So each cell contains 128*128 photon counting data and cell array contain 4000 to 8000 cells.
Question.
When I save each variable separately, size of each variable is 3GB. So for 7 variables it will be 21GB, Saving them and reading back takes awful lot of time. (RAM of my computer is 4GB)
Is there a way to reduce the size of variable?
Thanks.
Different data type will help. You can save data as float instead of double, as DICOM files have it as float too (from http://northstar-www.dartmouth.edu/doc/idl/html_6.2/DICOM_Attributes.html; Graphic Data). This halves size at no loss. You might want to expand to double when doing operations on data to avoid inaccuracies creeping up.
Additional compression by saving it as uint16 (additional x2 space saving) or even uint8 (x4) might be possible, but I would be wary of this - it might work great in all test cases but make problems when you least expect it.
Cell array is not problematic in terms of speed or size - you will not gain (much) by switching to something else. Your data gobbles up memory, not the cell array itself. If you wish, you can save data in a 128x128x7x8000 float array - it should work just fine too.
But if the number of images (this 4000-8000) can increase at any point, rescaling the array will be a pretty costly operation in terms of space and time. Cell arrays are much easier to extend - 8k values to move around instead of 8k*115k=900M values.
Another option is to separate data in chunks. You probably don't need to be working on all 4000 images at once. You can load 500 images, finish your work on them, move on to next 500 images etc. Batch size obviously depends on your hardware and what processing you do with data, but I guess about 500 could be a pretty reasonable starting point.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am trying to classify the events above as 1 or 0. 1 would be the lower values and 0 would be the higher values. Usually the data is does not look as clean as this. Currently the approach I am taking is to have two different thresholds so that in order to go from 0 to 1 it has to go past the 1 to 0 threshold and it has to be above for 20 sensor values. This threshold is set to the highest value I receive minus ten percent of that value. I dont think a machine learning approach will work because I have too few features to work with and also the implementation has to take up minimal code space. I am hoping someone may be able to point me in the direction of a known algorithm that would apply well to this sort of problem, googling it and checking my other sources isnt producing great results. The current implementation is very effective and the hardware inst going to change.
Currently the approach I am taking is to have two different thresholds so that in order to go from 0 to 1 it has to go past the 1 to 0 threshold and it has to be above for 20 sensor values
Calculate the area on your graph of those 20 sensor values. If the area is greater than a threshold (perhaps half the peak value) assign it as 1, else assign it as 0.
Since your measurements are one unit wide (pixels, or sensor readings) the area ends up being the sum of the 20 sensor values.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to make an algorithm which will enable the conduct of A/B testing over a variable number of subjects with a variable number of properties per subject.
For example I have 1000 people with the following properties: they come from two departments, some are managers, some are women etc. these properties may increase/decrease according to the situation.
I want to make an algorithm which will split the population in two with the best representation possible in both A and B of all the properties. So i want two groups of 500 people with equal number of both departments in both, equal number of managers and equal number of women. More specifically, I would like to maintain the ratio of each property in both A and B. So if we have 10% managers I want 10% of sample A and Sample B to be managers.
Any pointers on where to begin? I am pretty sure that such an algorithm exists. I have a gut feeling that this may be unsolvable in some cases as there may be an odd number of managers AND women AND Dept. 1.
Make a list of permutations of all a/b variables.
Dept1,Manager,Male
Dept1,Manager,Female
Dept1,Junior,Male
...
Dept2,Junior,Female
Go through all the people and assign them to their respective permutation. Maybe randomise the order of the people first just to be sure there is no bias in the order they are added to each permutation.
Dept1,Manager,Male-> Person1, Person16, Person143...
Dept1,Manager,Female-> Person7, Person10, Person83...
Have a second process that goes through each permutation and assigns half the people to one test group and half to the other. You will need to account for odd numbers of people in the group, but that should be fairly easy to factor in, obviously a larger sample size will reduce the impact of this odd number on the final results.
The algorithm for splitting the groups is simple - take each group of people who have all dimensions in common and assign half to the treatment and half to the control. You don't need to worry about odd numbers of people, whatever statistical test you are using will account for that. If some dimension is so skewed (i.e., there are only 2 females in your entire sample), it may be wise throw the dimension out.
Simple A/B tests usually use a t-test or g-test, but in your case, you'd be better of using an ANOVA to determine the significance of the treatment on each of the individual dimensions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I waw wondering what could be some of the common pitfalls that a novice go programmer could fall in when writing (unintentionally slow go code).
1) First, I know that in python doing string concatenation can be (or used to be expensive), is that the same in go when trying to add one element to a string? As in "hello"+"World".
2) The other issue is that I find myself very often having to extend my slice with a list of more bytes (rather than 1 byte at a time). I have a "dirty" way of appending it by doing the following:
newStr := string(arrayOfBytes) + string(newBytesToAppend)
Is that way slower than just doing something like?
for _, b := range newBytesToAppend{
arrayOfBytes = append(arrayOfBytes, b)
}
Or is there a better way to append whole slices to other slices or maybe a built in way? It just seems to me a little odd that I would even have to write my own extend function (or even benchmark it)
Also, sometimes I end up having to loop through every element of the byte slice and for readability, I change the type of that current byte to a string. As in:
for _, b := range newBytesToAppend{
c := string(b)
//some more logic on c
logic(c) //logic
}
3) I was wondering, if converting types in go is expensive (specially between string to arrays) and if that might be one of the factors that might be making the code slow. Btw, sometimes I change types (to strings) very often, nearly every iteration.
But more generally, I was trying to search for the web a list of hints of what often are things that makes go code slow and was trying to change it so that it wouldn't (but didn't have that much luck). I am very much aware that this depends from application to application, but was wondering if there are any "expert" advice on what usually makes "novice" go code slow.
4) The last thing I can think of is, that sometimes I do know in advance the length of the slice, so I could just use arrays with fixed length. Could that change anything?
5) I have also made my own types as in:
type Num int
or
type Name string
Do those hinder performance?
6) Is there a general list of heuristic to watch out in go for code optimization? For example, is dereferencing a problem as it can be in C?
Use bytes.Buffer / Buffer.Write, it handles re-sizing the internal slice for you and it's by far the most effiecent way to manage multiple []bytes.
About the 2nd question, it's rather easy to answer that using a simple benchmark.