Understanding input and labels in word2vec (TensorFlow) - arrays

I am trying to properly understand the batch_input and batch_labels from the tensorflow "Vector Representations of Words" tutorial.
For instance, my data
1 1 1 1 1 1 1 1 5 251 371 371 1685 ...
... starts with
skip_window = 2 # How many words to consider left and right.
num_skips = 1 # How many times to reuse an input to generate a label.
Then the generated input array is:
bach_input = 1 1 1 1 1 1 5 251 371 ....
This makes sense, starts from after 2 (= window size) and then continuous. The labels:
batch_labels = 1 1 1 1 1 1 251 1 1685 371 589 ...
I don't understand these labels very well. There are supposed to be 4 labels for each input right (window size 2, on each side). But the batch_label variable is the same length.
From the tensorflow tutorial:
The skip-gram model takes two inputs. One is a batch full of integers
representing the source context words, the other is for the target
words.
As per the tutorial, I have declared the two variables as:
batch = np.ndarray(shape=(batch_size), dtype=np.int32)
labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
How should I interpret the batch_labels?

There are supposed to be 4 labels for each input right (window size 2, on each side). But the batch_label variable is the same length.
The key setting is num_skips = 1. This value defines the number of (input, label) tuples each word generates. See the examples with different num_skips below (my data sequence seems to be different from yours, sorry about that).
Example #1 - num_skips=4
batch, labels = generate_batch(batch_size=8, num_skips=4, skip_window=2)
It generates 4 labels for each word, i.e. uses the whole context; since batch_size=8 only 2 words are processed in this batch (12 and 6), the rest will go into the next batch:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [12 12 12 12 6 6 6 6]
labels = [[6 3084 5239 195 195 3084 12 2]]
Example #2 - num_skips=2
batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=2)
Here you would expect each word appear twice in the batch sequence; the 2 labels are randomly sampled from 4 possible words:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [ 12 12 6 6 195 195 2 2]
labels = [[ 195 3084 12 195 3137 12 46 195]]
Example #3 - num_skips=1
batch, labels = generate_batch(batch_size=8, num_skips=1, skip_window=2)
Finally, this setting, same as yours, produces exactly one label per each word; each label is drawn randomly from the 4-word context:
data = [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, ...]
batch = [ 12 6 195 2 3137 46 59 156]
labels = [[ 6 12 12 195 59 156 46 46]]
How should I interpret the batch_labels?
Each label is the center word to be predicted from the context. But the generated data may take not all (context, center) tuples, depending on the settings of the generator.
Also note that the train_labels tensor is 1-dimensional. Skip-Gram trains the model to predict any context word from the given center word, not all 4 context words at once. This explains why all training pairs (12, 6), (12, 3084), (12, 5239) and (12, 195) are valid.

Related

How are 3-state cellular automata rules generated?

Let's limit the neighborhood to n=1 (which means we always need 3 cells to evaluate the next-gen cell).
Here's an example of a 2 state rule. Note that the upper row of the rules are generated in a particular order, whereas the lower row is the bit representation of the number 30.
I cannot find a single visualization of the equivalent for a 3 state CA. Following the logic of 2 state CA, it should contain 27 possible outcomes, but I have no clue in which order they should be generated. The lower row should be 30 in ternary (with leading zeroes to occupy a total of 27 positions).
Is there a general algorithm for generating these permutations in the conventional order of CAs (regardless of the number of states)?
Thank you very much in advance and sorry if the question is stupid. :(
What you are using is called Wolfram's code (from Stephen Wolfram) that is used for elementary CAs.
If you use more states or bigger neighborhoods then it is sufficient to extend it naturally.
Your question is not stupid.
For three states, this will give you ternary numbers. First write all the three digits number in ternary (descending order):
222, 221, 220, 212, 211, 210, 202, 201, 200, 122, 121, 120, 112, 111, 110, 102, 101, 100, 022, 021, 020, 012, 011, 010, 002, 001, 000
There are 27 of them 3^3, and 222_3 = 26, 221_3 = 25, 001_3 = 1, 000_3 = 0
Now decompose 30 onto base 3 27-digits number: 30 = 1*3^3+ 1*3^1, so there is only two digits equals to 1, the fourth and the second (from the right), here is rule 30 for radius-1 3-states CA:
000000000000000000000001010
This CA has a very different behavior than rule 30 radius-1 2-states CA.
Here is rule 33 for radius-1 3-states (33 = 1*3^3 + 2*3^1):
000000000000000000000001020
So for n,r, enumerate in descending order all 2r+1 digits numbers in base n and associate for each of them a value in [0,n[.

2D MATRIX statement for 5 x 5 Grid

Given a 5 x 5 Grid comprising of tiles numbered from 1 to 25 and a set of 5 start-end point pairs.
For each pair,find a path from the start point to the end point.
The paths should meet the below conditions:
a) Only Horizontal and Vertical moves allowed.
b) No two paths should overlap.
c) Paths should cover the entire grid
Input consist of 5 lines.
Each line contains two space-separated integers,Starting and Ending point.
Output: Print 5 lines. Each line consisting of space-separated integers,the path for the corresponding start-end pair. Assume that such a path Always exists. In case of Multiple Solution,print any one of them.
Sample Input
1 22
4 17
5 18
9 13
20 23
Sample Output
1 6 11 16 21 22
4 3 2 7 12 17
5 10 15 14 19 18
9 8 13
20 25 24 23
i think there should be restriction or it lacks some more information about the input ( start point and endpoint)
because if we take following input then covering whole grid is not possible
1 22,
6 7,
11 12,
16 17,
8 9

Saving vectors into a matrix matlab

I have a bunch of arrays that I have generated from a loop
Peaks [1, 2, 3, 4, 5]
Latency [23,24,25,26,27] etc.
I want to put all of those in a matrix that will look like that:
Peaks Latency
1 23
2 24
3 25
4 26
5 27
Then I'll want to save this as a text file.
It seems like it would be fairly simple but can't seem to find anything that closely speaks to me right now.
Concatentate:
>> Peaks = [1 2 3 4 5];
>> Latency = [23 24 25 26 27];
>> T = [Peaks(:) Latency(:)]
T =
1 23
2 24
3 25
4 26
5 27
Write:
fileName = 'PeaksLatency.txt';
hdr = {'Peaks','Latency'}
txt = sprintf('%s\t',hdr{:}); txt(end) = [];
dlmwrite(fileName,txt,''); % write header
dlmwrite(fileName,T,'-append','delimiter','\t'); % append data
Here is the code
Peaks = [1, 2, 3, 4, 5].';
Latency = [23,24,25,26,27].';
T = table(Peaks, Latency);
writetable(T,'table.txt', 'Delimiter', '\t');
Note that you need to make Peaks and Latency into column vectors (use .' operator).
Ref: http://www.mathworks.com/help/matlab/ref/writetable.html

separate chaining vs linear probing

a set of objects with keys: 12, 44, 13, 88, 23, 94, 11, 39, 20, 16, 5
Write the hash table where M=N=11 and collisions are handled using separate chaining.
h(x) = | 2x + 5 | mod M
So I did it with linear probing and got
11 39 20 5 16 44 88 12 23 13 94
which I am pretty sure is right, but how do you do it with separate chaining? I realize separate chaining uses linked lists, but how would the hash table look like?

find and replace values in cell array

I have a cell array like this: [...
0
129
8...2...3...4
6...4
0
I just want to find and replace specific values, but I can't use the ordinary function because the cells are different lengths. I need to replace many specific values at the same time and there is no general function about how values are replaced. However, sometimes several input values should be replaced by the same output.
so I want to say
for values 1:129
'if 0, then 9'
'elseif 1 then 50'
'elseif 2 or 3 or 4 then 61'
etc...up to 129
where these rules are applied to the entire array.
I've tried to work it out myself, but still getting nowhere. Please help!
Since your values appear to span the range 0 to 129, one solution is to add one to these values (so they span the range 1 to 130) and use them as indices into a vector of replacement values. Then you can apply this operation to each cell using the function CELLFUN. For example:
>> C = {0, 129, [8 2 3 4], [6 4], 0}; %# The sample cell array you give above
>> replacement = [9 50 61 61 61 100.*ones(1,125)]; %# A 1-by-130 array of
%# replacement values (I
%# added 125 dummy values)
>> C = cellfun(#(v) {replacement(v+1)},C); %# Perform the replacement
>> C{:} %# Display the contents of C
ans =
9
ans =
100
ans =
100 61 61 61
ans =
100 61
ans =
9

Resources