Storing user-defined logic in a database - database

I'm designing a database to store information about events that are dynamic in nature. What I mean by this is that, each type of event will have some variables attached to them that changes on each occurrence based on some rules defined by the user.
Let's say we have Event Type A with variable X and Y. In this event type, the user can define some rules that determines the value of X and Y on each occurrence of the event.
An example of a set of rules a user might define:
On first occurrence, X = 0; Y = 0;
On each occurrence, X = X + 1;
On each occurrence, if X == 100 then { X = 0; Y = Y + 1 }
By defining these rules, the value of X and Y changes dynamically on all occurrences of the event as follow:
1st occurrence: X = 1, Y = 0
2nd occurrence: X = 2, Y = 0
...
100th occurrence: X = 0, Y = 1
Now, I'm not sure how to store the "user-defined rules" in a database and later query them in my code. Can anyone point me in the right direction? Here's a start:
EVENTS
id;
name;
description;
event_type;
EVENT_TYPE_A_OCCURRENCES
id;
event_id;
X;
Y;
EVENT_RULES
id;
event_id;
frequency; // the frequency in which this rule applies
at_occurrence; // apply this rule at a specific occurrence
condition; // stores the code for the condition
statements; // stores the code for the statements
I'm no expert, please help me solve this problem. Thank you.

Assume following user defined rules stored in table:
-----------------------------------------------------------------
|eventid|occurance|keep-old-x|keep-old-y|x-frequency|y-frequency|
-----------------------------------------------------------------
| A | 1 | T | F | 1 | 100 |
-----------------------------------------------------------------
| B | 2 | F | T | -2 | 0 |
-----------------------------------------------------------------
| C | 5 | T | T | 100 | -3 |
-----------------------------------------------------------------
Lets say before event X = 10, Y = 12.
Event = A, ocuuurance = 1, keep-old-x = T, keep-old-y = F, x-frequency = 1, y-frequency = 100
if keep-old-x is T then
X = X + x-frequency
else
X = x-frequency
endif
if keep-old-y is T then
Y = Y + y-frequency
else
Y = y-frequency
endif
Now, X = 11, Y = 100
You may need to add two more columns to change value of X variable on specific value; as:
--------------------------
|if-x-value| x-new-value |
--------------------------
| 100 | 0 |
--------------------------
| 125 | 5 |
--------------------------
| 150 | 10 |
--------------------------
I hope this helps.

Related

Creating multiple observations based on multiple other ones

I am trying to find a way to create observations based on multiple other ones using SAS.
For example, I have the following table:
+------+--------------------+-------------------+
| ID | START_DATE | END_DATE |
+------+--------------------+-------------------+
| ABC1 | 01FEB201500:00:00 | 30NOV201600:00:00 |
| ABC1 | 01JAN201700:00:00 | 30NOV201800:00:00 |
+------+--------------------+-------------------+
And I would like to create a table where all the timestamps for the period 01JAN2014 to 31DEC2020 are covered. In other words, it would consist of creating 2 more observations to the dataset to look like this;
+------+--------------------+-------------------+
| ID | START_DATE | END_DATE |
+------+--------------------+-------------------+
| ABC1 | 01FEB201400:00:00 | 31JAN201500:00:00 |
| ABC1 | 01FEB201500:00:00 | 30NOV201600:00:00 |
| ABC1 | 01DEC201600:00:00 | 30NOV201800:00:00 |
| ABC1 | 01DEC201800:00:00 | 31DEC202000:00:00 |
+------+--------------------+-------------------+
The SAS code to re-create this example is:
DATA test;
INPUT ID :$4. START_DATE :datetime18. END_DATE :datetime18.;
FORMAT START_DATE datetime20. END_DATE datetime20.;
CARDS;
ABC1 01FEB201400:00:00 31JAN201500:00:00
ABC1 01JAN201700:00:00 30NOV201800:00:00
;
RUN;
I don't see a way to do this in SAS
You can fill in (or compute) intra range gaps using basic comparisons, some holding variables and a retained variable.
Example:
Presume no ranges overlap and are order low start first.
data have;
input id x1 x2; datalines;
1 3 7
1 11 14
2 4 9
2 15 18
3 1 11
4 11 20
5 1 2
5 3 4
5 5 9
5 10 20
;
data want;
set have;
by id;
length type $6;
* fill in ranges for every integer 1 through 20;
if first.id then do;
bot = 1;
retain bot;
end;
if bot < x1 then do;
hold1 = x1;
hold2 = x2;
x1 = bot;
x2 = hold1 - 1;
type = 'gap -';
output;
x1 = hold1;
x2 = hold2;
type = 'have';
bot = x2 + 1;
output;
end;
else if x1 <= bot <= x2 then do;
bot = x2 + 1;
type = 'have';
output;
end;
if last.id and 20 >= bot > x2 then do;
type = 'gap +';
x1 = bot;
x2 = 20;
output;
end;
keep type id x1 x2 bot;
run;

Recursive Functions return value

I don't seem to understand the logic behind this particular code. I don't understand why the answer is 18. You can check for the answer in a compiler as well. Anyone wh understands the logic please let me know.
Here's the code:
#include <stdio.h>
int GuessMe(int,int);
main() {
printf("%d", GuessMe(8,2));
}
int GuessMe(int x, int y) {
if ( y > x)
return x
else
return GuessMe(x-2, y+2) + x;
}
Initially, GuessMe is passed x=8,y=2:
x | y | y > x ? | initial return value | final return value
8 | 2 | NO | GuessMe(6, 4) + 8 | 18
6 | 4 | NO | GuessMe(4, 6) + 6 | 10
4 | 6 | YES | 4 | 4
Read down the initial return value column and then read back up the final return value column once you hit an initial return value that isn't recursive.

Why this type of power function work?

res = 1;
for ( i = 1; i <= n; i <<= 1 ) // n = exponent
{
if ( n & i )
res *= a; // a = base
a *= a;
}
This should be more effective code for power and I don't know why this works.
First line of for() is fine I know why is there i <<= i. But I don't understand the line where is: if ( n & i ). I know how that works but I don't know why...
Let us say you have a binary representation of an unsigned number. How do you find the decimal representation?
Let us take a simple four bit example:
N = | 0 | 1 | 0 | 1 |
-----------------------------------------
| 2^3 = 8 | 2^2 = 4 | 2^1 = 2 | 2^0 = 1 |
-----------------------------------------
| 0 | 4 | 0 | 1 | N = 4 + 1 = 5
Now what would happen if the base wasn't fixed at 2 for each bit but instead was the square of the previous bit and you multiply the contribution from each bit instead of adding:
N = | 0 | 1 | 0 | 1 |
----------------------------
| a^8 | a^4 | a^2 | a^1 |
----------------------------
| 0 | a^4 | 0 | a^1 | N = a^4 * a^1 = a^(4+1) = a^5
As you can see, the code calculate a^N

Weighted random integers

I want to assign weightings to a randomly generated number, with the weightings represented below.
0 | 1 | 2 | 3 | 4 | 5 | 6
─────────────────────────────────────────
X | X | X | X | X | X | X
X | X | X | X | X | X |
X | X | X | X | X | |
X | X | X | X | | |
X | X | X | | | |
X | X | | | | |
X | | | | | |
What's the most efficient way to do it?
#Kerrek's answer is good.
But if the histogram of weights is not all small integers, you need something more powerful:
Divide [0..1] into intervals sized with the weights. Here you need segments with relative size ratios 7:6:5:4:3:2:1. So the size of one interval unit is 1/(7+6+5+4+3+2+1)=1/28, and the sizes of the intervals are 7/28, 6/28, ... 1/28.
These comprise a probability distribution because they sum to 1.
Now find the cumulative distribution:
P x
7/28 => 0
13/28 => 1
18/28 => 2
22/28 => 3
25/28 => 4
27/28 => 5
28/28 => 6
Now generate a random r number in [0..1] and look it up in this table by finding the smallest x such that r <= P(x). This is the random value you want.
The table lookup can be done with binary search, which is a good idea when the histogram has many bins.
Note you are effectively constructing the inverse cumulative density function, so this is sometimes called the method of inverse transforms.
If your array is small, just pick a uniform random index into the following array:
int a[] = {0,0,0,0,0,0,0, 1,1,1,1,1,1, 2,2,2,2,2, 3,3,3,3, 4,4,4, 5,5, 6};
If you want to generate the distribution at runtime, use std::discrete_distribution.
To get the distribution you want, first you basically add up the count of X's you wrote in there. You can do it like this (my C is super rusty, so treat this as pseudocode)
int num_cols = 7; // for your example
int max;
if (num_cols % 2 == 0) // even
{
max = (num_cols+1) * (num_cols/2);
}
else // odd
{
max = (num_cols+1) * (num_cols/2) + ((num_cols+1)/2);
}
Then you need to randomly select an integer between 1 and max inclusive.
So if your random integer is r the last step is to find which column holds the r'th X. Something like this should work:
for(int i=0;i<num_cols;i++)
{
r -= (num_cols-i);
if (r < 1) return i;
}

Loop unrolling and its effects on pipelining and CPE (have the solution, but don't understand it)

Below the line is a question on a practice test. The table actually has all the solutions filled in. However, I need clarification upon why the solutions are what they are. (Read the question below the horizontal line).
For example, I would really like to understand the solution row for A2 and A3.
As I see it, you have the following situation going on in A2:
x * y
xy * r
xyr * z
Now, let's look at how that'd be in the pipeline:
|1|2|3|4|5|6|7|8 |9|10|11|12|13|14|15|16|17|18|19|20|21|
| | | | | | | | | | | | | | | | | | | | | |
{ x * y } | | | | | | | | | | | | | | | | |
{ xy * r } | | | | | | | | | | | | |
{ xyr * z } | | | | | | | | |
//next iteration, which means different x, y and z's| |
{x2 * y2 } | | | | | | | |
{x2y2 * r } // this is dependent on both previous r and x2y2
{x2y2r * z }
So we are able to overlap xyr * z and x2 * y2, because there are no dependency conflicts. However, that is only getting rid of 3 cycles right?
So it would still be (12 - 3) / 3 = 9 / 3 = 3 Cycles Per Element (three elements). So how are they getting 8/3 CPE for A2?
Any help understanding this concept will be greatly appreciated! There's not a big rush, as the test isn't til next week. If there is any other information you need, please let me know!
(Below is the full test question text, along with the table completely filled in with the solutions)
Consider the following function for computing the product of an array of n integers.
We have unrolled the loop by a factor of 3.
int prod(int a[], int n) {
int i, x, y, z;
int r = 1;
for(i = 0; i < n-2; i += 3) {
x = a[i]; y = a[i+1]; z = a[i+2];
r = r * x * y * z; // Product computation
}
for (; i < n; i++)
r *= a[i];
return r;
}
For the line labeled Product computation, we can use parentheses to create five different
associations of the computation, as follows:
r = ((r * x) * y) * z; // A1
r = (r * (x * y)) * z; // A2
r = r * ((x * y) * z); // A3
r = r * (x * (y * z)); // A4
r = (r * x) * (y * z); // A5
We express the performance of the function in terms of the number of cycles per element
(CPE). As described in the book, this measure assumes the run time, measured in clock
cycles, for an array of length n is a function of the form Cn + K, where C is the CPE.
We measured the five versions of the function on an Intel Pentium III. Recall that the integer multiplication operation on this machine has a latency of 4 cycles and an issue time of 1 cycle.
The following table shows some values of the CPE, and other values missing. The measured
CPE values are those that were actually observed. “Theoretical CPE” means that performance
that would be achieved if the only limiting factor were the latency and issue time of
the integer multiplier.
Fill in the missing entries. For the missing values of the measured CPE, you can use the
values from other versions that would have the same computational behavior. For the values
of the theoretical CPE, you can determine the number of cycles that would be required for
an iteration considering only the latency and issue time of the multiplier, and then divide by 3.
Without knowing the CPU architecture, we can only guess.
My interpretation would be that the timing diagram only shows part of the pipeline, from gathering the operands to writing the result, because this is what is relevant to dependency resolution.
Now, the big if: If there is a buffer stage between the dependency resolver and the execution units, it would be possible to start the third multiplication of the first group (3) and the first multiplication of the second group (4) both at offset 8.
As 3 is dependent on 2, it does not make sense to use a different unit here, so 3 is queued to unit 1 right after 2. The following instruction, 4 is not dependent on a previous result, so it can be queued to unit 2, and started in parallel.
In theory, this could happen as early as cycle 6, giving a CPE of 6/3. In practice, that is dependent on the CPU design.

Resources