Related
Having trouble with this :
I have the following query which gives me exactly the view I need.
SELECT t.Week,
SUM(t.Hours) AS AH,
SUM(c.Input) AS input,
SUM(c.[recruitedPos]) AS recruitedPos,
(SUM(t.Hours) - SUM(c.Input)) AS Possible
FROM tempPOC AS t
LEFT JOIN (SELECT Week,
SUM(input) AS Input,
SUM([Normal Weekly Hours]) AS recruitedPos,
Shop
FROM colData
GROUP BY week,
Shop) AS c ON t.Week = c.Week
AND c.Shop = t.Store
GROUP BY t.Week,
t.Store
ORDER BY t.Week;
What I'm having trouble with is writing a case statement to set any values from possible alias to 0 if they are lower than 0.
I thought this would be as simple as
(CASE(SUM(t.Hours - c.Hours)) < 0 then 0 else Possible end as Possible)
but this is giving me error
Msg 156, Level 15, State 1, Line 9
Incorrect syntax near the keyword 'as'.
Input :
Week, AH, Input, RecruitedPos, Possible
1, 15, 25, 13, -10
1, 30, 15, 15, 15
expected output :
Week, AH, Input, RecruitedPos, Possible
1, 15, 25, 13, 0
1, 30, 15, 15, 15
CASE
WHEN SUM(t.Hours) - SUM(c.Input) < 0 THEN 0
ELSE SUM(t.Hours) - SUM(c.Input)
END AS Possible
I am trying to properly understand the batch_input and batch_labels from the tensorflow "Vector Representations of Words" tutorial.
For instance, my data
1 1 1 1 1 1 1 1 5 251 371 371 1685 ...
... starts with
skip_window = 2 # How many words to consider left and right.
num_skips = 1 # How many times to reuse an input to generate a label.
Then the generated input array is:
bach_input = 1 1 1 1 1 1 5 251 371 ....
This makes sense, starts from after 2 (= window size) and then continuous. The labels:
batch_labels = 1 1 1 1 1 1 251 1 1685 371 589 ...
I don't understand these labels very well. There are supposed to be 4 labels for each input right (window size 2, on each side). But the batch_label variable is the same length.
From the tensorflow tutorial:
The skip-gram model takes two inputs. One is a batch full of integers
representing the source context words, the other is for the target
words.
As per the tutorial, I have declared the two variables as:
batch = np.ndarray(shape=(batch_size), dtype=np.int32)
labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
How should I interpret the batch_labels?
There are supposed to be 4 labels for each input right (window size 2, on each side). But the batch_label variable is the same length.
The key setting is num_skips = 1. This value defines the number of (input, label) tuples each word generates. See the examples with different num_skips below (my data sequence seems to be different from yours, sorry about that).
Example #1 - num_skips=4
batch, labels = generate_batch(batch_size=8, num_skips=4, skip_window=2)
It generates 4 labels for each word, i.e. uses the whole context; since batch_size=8 only 2 words are processed in this batch (12 and 6), the rest will go into the next batch:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [12 12 12 12 6 6 6 6]
labels = [[6 3084 5239 195 195 3084 12 2]]
Example #2 - num_skips=2
batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=2)
Here you would expect each word appear twice in the batch sequence; the 2 labels are randomly sampled from 4 possible words:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [ 12 12 6 6 195 195 2 2]
labels = [[ 195 3084 12 195 3137 12 46 195]]
Example #3 - num_skips=1
batch, labels = generate_batch(batch_size=8, num_skips=1, skip_window=2)
Finally, this setting, same as yours, produces exactly one label per each word; each label is drawn randomly from the 4-word context:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [ 12 6 195 2 3137 46 59 156]
labels = [[ 6 12 12 195 59 156 46 46]]
How should I interpret the batch_labels?
Each label is the center word to be predicted from the context. But the generated data may take not all (context, center) tuples, depending on the settings of the generator.
Also note that the train_labels tensor is 1-dimensional. Skip-Gram trains the model to predict any context word from the given center word, not all 4 context words at once. This explains why all training pairs (12, 6), (12, 3084), (12, 5239) and (12, 195) are valid.
Let's limit the neighborhood to n=1 (which means we always need 3 cells to evaluate the next-gen cell).
Here's an example of a 2 state rule. Note that the upper row of the rules are generated in a particular order, whereas the lower row is the bit representation of the number 30.
I cannot find a single visualization of the equivalent for a 3 state CA. Following the logic of 2 state CA, it should contain 27 possible outcomes, but I have no clue in which order they should be generated. The lower row should be 30 in ternary (with leading zeroes to occupy a total of 27 positions).
Is there a general algorithm for generating these permutations in the conventional order of CAs (regardless of the number of states)?
Thank you very much in advance and sorry if the question is stupid. :(
What you are using is called Wolfram's code (from Stephen Wolfram) that is used for elementary CAs.
If you use more states or bigger neighborhoods then it is sufficient to extend it naturally.
Your question is not stupid.
For three states, this will give you ternary numbers. First write all the three digits number in ternary (descending order):
222, 221, 220, 212, 211, 210, 202, 201, 200, 122, 121, 120, 112, 111, 110, 102, 101, 100, 022, 021, 020, 012, 011, 010, 002, 001, 000
There are 27 of them 3^3, and 222_3 = 26, 221_3 = 25, 001_3 = 1, 000_3 = 0
Now decompose 30 onto base 3 27-digits number: 30 = 1*3^3+ 1*3^1, so there is only two digits equals to 1, the fourth and the second (from the right), here is rule 30 for radius-1 3-states CA:
000000000000000000000001010
This CA has a very different behavior than rule 30 radius-1 2-states CA.
Here is rule 33 for radius-1 3-states (33 = 1*3^3 + 2*3^1):
000000000000000000000001020
So for n,r, enumerate in descending order all 2r+1 digits numbers in base n and associate for each of them a value in [0,n[.
It's hard to know what terms to search for on stackoverflow for this problem. Say you have a target array of numbers like [100, 250, 400, 60]
I want to be able to score the closeness other arrays have to this target based on a threshold / error bars of say 10. So for example, the array:
[90, 240, 390, 50] would get a high score (or positive match result) because of the error bars.
The order matters, so
[60, 400, 250, 100] would get zero score (or negative match result)
The arrays can be different sizes so
[33, 77, 300, 110, 260, 410, 60, 99, 23] would get good score or positive match result.
A good way to think about the problem is to imagine these numbers are frequencies of musical notes like C,G,E,F and I'm trying to match a sequence of notes against a target.
Searching stackoverflow I'm not sure is this post will work, but it's close:
Compare difference between multiple numbers
Update 17th Jan 2015:
I failed to mention a scenario that might affect current answers. If the array has noise between those target numbers, I still want to find a positive match. For example [33, 77, 300, 110, 260, 300, 410, 40, 60, 99, 23].
I believe what you're looking for is sequence similarity.
You can read about them on this wikipedia page. Your case seems fit to local alignment category. There's some algorithm you can choose :
Needleman–Wunsch algorithm
Levenshtein distance
However, since these algorithms compare strings, you have to design your own scoring rule when inserting, deleting or comparing numbers.
Sounds like what you're looking for is the RMS error, where RMS is the square Root of the Mean Squared error. Let me illustrate by example. Assume the target array is [100, 250, 400, 60] and the array to be scored is [104, 240, 410, 55]
First compute the difference values, i.e. the errors
100 250 400 60
-104 -240 -410 -55
---- ---- ---- ---
-4 10 -10 5
Then square the errors to get 16 100 100 25. Compute the mean of the squared errors
(16 + 100 + 100 + 25) / 4 = 60.25
And finally, take the square root sqrt(60.25) = 7.76
When the arrays are different sizes, you can speed things up by only computing the RMS error if the first value is within a certain threshold, say +- 30. Using the example [33, 77, 300, 110, 260, 410, 60, 99, 23], there would only be two alignments to check, because with the other alignments the first number is more than 30 away from 100
33 77 300 110 260 410 60 99 23
100 250 400 60 --> RMS score = 178
100 250 400 60 --> RMS score = 8.7
Low score wins!
What is the best way to store a list of random numbers (like lotto/bingo numbers) and retrieve them? I'd like to store on a Database a number of rows, where each row contains 5-10 numbers ranging from 0 to 90. I will store a big number of those rows. What I'd like to be able is to retrieve the rows that have at least X number in common to a newly generated row.
Example:
[3,4,33,67,85,99]
[55,56,77,89,98,99]
[3,4,23,47,85,91]
Those are on the DB
I will generate this:
[1,2,11,45,47,88] and now I want to get the rows that have at least 1 number in common with this one.
The easiest (and dumbest?) way is to make 6 select and check for similar results.
I thought to store numbers with a large binary string like
000000000000000000000100000000010010110000000000000000000000000 with 99 numbers where each number represent a number from 1 to 99, so if I have 1 at the 44th position, it means that I have 44 on that row. This method is probably shifting the difficult tasks to the Db but it's again not very smart.
Any suggestion?
You should create a table like so:
TicketId Number
1 3
1 4
1 33
1 67
1 85
1 99
2 55
2 56
2 77
etc...
Then your query, at least for X = 1, becomes:
SELECT DISTINCT TicketId FROM Ticket WHERE Number IN (1, 2, 11, 45, 47, 88)
The advantage of this is that you can use an index instead of a full table scan.
For X greater than one, you could do the following:
SELECT TicketId, COUNT(*) AS cnt
FROM Ticket WHERE Number IN (1, 2, 11, 45, 47, 88)
GROUP BY TicketId
HAVING COUNT(*) >= 3
Again this will be able to use the index.