Binary classification of sensor data using minimal code space [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am trying to classify the events above as 1 or 0. 1 would be the lower values and 0 would be the higher values. Usually the data is does not look as clean as this. Currently the approach I am taking is to have two different thresholds so that in order to go from 0 to 1 it has to go past the 1 to 0 threshold and it has to be above for 20 sensor values. This threshold is set to the highest value I receive minus ten percent of that value. I dont think a machine learning approach will work because I have too few features to work with and also the implementation has to take up minimal code space. I am hoping someone may be able to point me in the direction of a known algorithm that would apply well to this sort of problem, googling it and checking my other sources isnt producing great results. The current implementation is very effective and the hardware inst going to change.

Currently the approach I am taking is to have two different thresholds so that in order to go from 0 to 1 it has to go past the 1 to 0 threshold and it has to be above for 20 sensor values
Calculate the area on your graph of those 20 sensor values. If the area is greater than a threshold (perhaps half the peak value) assign it as 1, else assign it as 0.
Since your measurements are one unit wide (pixels, or sensor readings) the area ends up being the sum of the 20 sensor values.

Related

Array or Map is better in database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I am using AWS dynamoDB.
option 1
[{"id":"01","avaliable":true},
{"id":"02","avaliable":true},
{"id":"03","avaliable":false},
{"id":"04","avaliable":true}
{"id":"05","avaliable":false}]
option 2
"avaliable":[true,true,false,true,false]
id will always in sequence and start with 0 so I think it is a waste to include "id" as attribute. I can just update avaliabe in option 2 using {id-1} as array index. But I am not sure will there be any other issue if I use option 2. I am orginally using option 1 and will check whether the id correct before update. I am afraid option 2 will have mistake easily.
Which structure do you think is better?
Personally I prefer to use Map type in DynamoDB because it allows you to update on a key versus guessing what index you need in an array. However that would be option 3:
"mymap":{
"id01":{"avaliable":true},
"id02":{"avaliable":true},
"id03":{"avaliable":true},
"id04":{"avaliable":true},
"id05":{"avaliable":true}
}
This allows you up modify elements without first trying to figure out what position in an array it might be, which sometimes requires you to first read the item and can cause concurrency issues.
I do notice you mention that you equate the position of the item in the array, however I feel this is a more fool-proof way for general implementation. For example, if you need to remove a value from the middle of the list, it would not cause any issues.
That is one thing that can influence your decision, the other 2 being item size and total storage.
If your item size is substantially less than 1KB then you will have no issue using option 1 or 3 which will increase your item size slightly compared to option 2. As long as the extra characters do not push your average item size over the nearest 1KB value as that will mean that you will have increased your capacity consumption for write requests.
The other being the total storage size. DynamoDB provides a free tier of 25GB of storage. If you have millions of items causing you to increase your storage size substantially, then you may decide to use option 2.

How to stuff any number of values in 8-10 bytes of data for n number of 16 bit values? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am working on algorithm where i can have any number of 16 bit values(For instance i have 1000 16 bit values , and all are sensor data, so no particular series or repetition). I want to stuff all of this data into an 8 or a 10 byte array(each and every value of the 1000 16 bits numbers should be inside the 10 byte array) . The information should be such that i can also easily decode to read each and every value from the 1000 values.
I have thought of using sin function by dividing the values by 100 so every data point would always be in 8 bits(0-1 sin value range) , but that only covers up small range of data and not huge number of values.
Pardon me if i am asking for too much. I am just curious if its possible or not.
The answer to this question is rather obvious with a little knowledge in information sciences. It is not possible to store that much information in so little memory, and the data you are talking about just contains too much information.
Some data, like repetitive data or data which is following some structure (like constantly rising values), contains very little information. The task of compression algorithms is to figure out the structure or repetition and instead of storing the pure data to store the structure or rule how to reproduce the data instead.
In your case, the data is coming from sensors and unless you are willing to lose a massive amount of information, you will not be able to generate a compressed version of it with a compression factor in the magnitude your are talking about (1000 × 2 bytes into 10 bytes). If your sensors more or less produce the same values all the time with just a little jitter, a good compression can be achieved (but for this your question is way to broad to be answered here) but it will probably never be in the range of reducing your 1000 values to 10 bytes.

How to implement a MATLAB lowpass filter in C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have designed a board that samples audio input using a 16 bit DAC at 48kHz. It stores the data as signed 16-bit integers. I have also implemented a 16 bit ADC on the board and I am able to pass audio through the board successfully.
I would like to design a low pass filter using MATLAB and implement it on this board. I understand how to create basic filters using MATLAB but I cant quite grasp how to bridge the gap between creating the filter in MATLAB and implementing this filter using C code on my board. I would like to be able to pass the signal into the board and observe the filtered signal on the output in 'real-time'.
How can this be achieved?
ok, you said that you get your coefficients from a [B,A]= butter(..) likewise filter (try getting them in Z domain AKA digital filter), those A,B coefficients correspond to a simple transfer function you know
H(z) = B(z)/A(z) = (b(1)+b(2) z^−1+⋯+b(n+1) z^−n)/(a(1)+a(2) z^−1+⋯+a(n+1) z^−n)
right?
you just need to remember that the output y = H(z)*x or in other words
y = B(z)/A(z) * x and finally A(z)*y = b(z)*x
and what was x(t) * z^-1 equals? yep x(t-1)
that means that you'll end with an ecuation similar to:
y(t)*a(1)+y(t-1)*a(2)+⋯+y(t-n)a(n+1) = x(t)*b(1)+x(t-1)*b(2)+⋯+x(t-n)b(n+1)
and what we need is the actual value of y(t) with the known values of actual x(t) and past x(t-1) etc, and also with known and stored values of past y(t-1) etc...
y(t) = 1/a(1) * (x(t)*b(1)+x(t-1)*b(2)+⋯+x(t-n)b(n+1) - y(t-1)*a(2)-⋯-y(t-n)a(n+1))
that means you need two arrays for x and y, and apply the equation with the B and A arrays you got from matlab...
sadly, this assumes you ALREADY took in consideration the sampling time in the butter() (hence Wn should be normalized) and make sure you take your samples at that exact sampling time (and ideally calculate your output at the exact time too)

A/B testing sorting algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to make an algorithm which will enable the conduct of A/B testing over a variable number of subjects with a variable number of properties per subject.
For example I have 1000 people with the following properties: they come from two departments, some are managers, some are women etc. these properties may increase/decrease according to the situation.
I want to make an algorithm which will split the population in two with the best representation possible in both A and B of all the properties. So i want two groups of 500 people with equal number of both departments in both, equal number of managers and equal number of women. More specifically, I would like to maintain the ratio of each property in both A and B. So if we have 10% managers I want 10% of sample A and Sample B to be managers.
Any pointers on where to begin? I am pretty sure that such an algorithm exists. I have a gut feeling that this may be unsolvable in some cases as there may be an odd number of managers AND women AND Dept. 1.
Make a list of permutations of all a/b variables.
Dept1,Manager,Male
Dept1,Manager,Female
Dept1,Junior,Male
...
Dept2,Junior,Female
Go through all the people and assign them to their respective permutation. Maybe randomise the order of the people first just to be sure there is no bias in the order they are added to each permutation.
Dept1,Manager,Male-> Person1, Person16, Person143...
Dept1,Manager,Female-> Person7, Person10, Person83...
Have a second process that goes through each permutation and assigns half the people to one test group and half to the other. You will need to account for odd numbers of people in the group, but that should be fairly easy to factor in, obviously a larger sample size will reduce the impact of this odd number on the final results.
The algorithm for splitting the groups is simple - take each group of people who have all dimensions in common and assign half to the treatment and half to the control. You don't need to worry about odd numbers of people, whatever statistical test you are using will account for that. If some dimension is so skewed (i.e., there are only 2 females in your entire sample), it may be wise throw the dimension out.
Simple A/B tests usually use a t-test or g-test, but in your case, you'd be better of using an ANOVA to determine the significance of the treatment on each of the individual dimensions.

Partial sine data fit code C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have 8 data points that form the peak of a partial sine wave. I am trying to fit these to get an equation so I discover the point of the true maximum position (which most likely lies between the data points). The coding will be in C. Does anyone have any info on algorithms or ideally code samples?
Since the data points are all near a maximum, the wave y = A*sin(B*x + C) + D can be approximated as a parabola much like the first 2 terms of cos(x) = (1.0 - x*x/2! + ...).
So find the best fit parabola for the 8 data points and calculate the maximum.
C- Peak detection via quadratic fit
Lots of google examples exist. Example
Provided your sample-values form a "hump", i.e. increasing followed by decreasing samples, you could try viewing the samplevalues as "weights" and compute the "center of gravity":
float cog = 0f;
for (i=0; i<num_samples; ii+) {
cog += i * samples[i];
}
cog /= num_samples;
I've used that in similar cases in the past.
NOTE: This scheme only works if the set of samples used contain a single peak, which the question phrasing certainly made me think was the case. Finding locations of interest can easily be done by monitoring, if sample values are increasing or decreasing, selecting an "interesting" range of samples and computing the peak location as described.
Also note, that if the actual goal is to determine the sine wave phase or frequency of an input signal, it would be a lot better to correlate the signal against reference set of sine-waves (in other words, do a Fourier transform).

Resources