LingPipe LDA matrix representation - sparse-matrix

I am trying to extract possible topics from list of tweets and LingPipe LDA seems easy to understand and well documented with code sample.
My challenge is to produce the matrix representation using tweets data. For example,
static String[] WORDS = new String[] {
"river", "stream", "bank", "money", "loan"
};
static final int[][] DOC_WORDS = new int[][] {
{ 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 0, 0, 0 },
{ 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 0, 0 },
{ 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 0 },
{ 0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4 }
}
The zero at the end of the above matrix is supposed to represent that none of the word in WORDS array is found in the content. However in this representation, it is presumed to be the zero index or the word 'river' is found.
As tweet is short, I am not sure how I can represent the matrix so that it can show the 'absence' of the word too.
Any advice or suggestion of other method is mush appreciated.

Related

Loop over all the variables automatically in R

I have a dataset of 3 dependent variables (height,color and habit) and 3 independent (rep,block and flower_name). I have 3 replications and 20 blocks each blocks repeated 6 time.
data_com<-data[1:18,]
flower<-(list(rep = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3), flowername = c("yellow", "orange", "black",
"black1", "orange1", "violet", "violet1", "violet3", "purple",
"purple1", "purple3", "red", "red1", "lila", "sky", "pink", "purple_pink",
"purple_pink1"), height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5), color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7), habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
My model looks like this:
data <- readxl::read_excel("flower.xlsx",sheet = 1)
str(data)
names(data)
data$flowername <- as.factor(data$"flowername")
data$rep <- as.factor(data$"rep")
data$block <- as.factor(data$"block")
data$height <- as.numeric(data$"height")
model <- lmer(height~ 1 + (1|flowername) + (1|rep) , data = data) summary(model)
------------------------------------------------------------------------
And I would like to have a loop which runs over all the dependent variable once. later I would like to save the random effects for all variables as a list and as xlsx, so that I could use it for further analysis. I would also like to save the anova output for all dependent variables as xlsx as well.
I am new to R and looping seems readlly difficult for me to understand. any help would be appreciated.
I am also new in stackoverflow so correct me please if the post is not properly formatted. Thank you
First off, your example data does not work just like that. You have too many (flowername) or too few (rep) levels for the low number of rows. This results in errors when fitting the models. The following example data works, however:
data_com<-data[1:18,]
flower<-structure(list(rep = rep(1:2, 9),
block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3),
flowername = rep(c("yellow", "orange", "black"), 6),
height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5),
color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7),
habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)),
row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
flower$flowername <- as.factor(flower$"flowername")
flower$rep <- as.factor(flower$"rep")
flower$block <- as.factor(flower$"block")
flower$height <- as.numeric(flower$"height")
So, to "automatically" run through your dependent variables, you need to make a function that fits the model and extracts the results that you are interested in:
get.re <- function(dependent, dat) {
require(lme4)
dat$dependent <- dat[[dependent]] # specifies the dependent variable
model <- lmer(dependent ~ 1 + (1|flowername) + (1|rep),
data = dat) # fits the model
cat("\n\n============ Model for dependent:", dependent, "============\n")
print(summary(model)) # shows you the summary
ranef(model) # returns the random effects of the model
}
# make a vector of the dependent variable names
dependents <- c("height", "color", "habit")
# apply the function to each dependent variable
fits.ls <- lapply(dependents,
get.re,
dat = flower)
names(fits.ls) <- dependents # name the list elements
The random effects of each model are given as a list of matrices (or data frames, not sure) where the row names are the levels of your random factors. The following code collapses these nested lists of matrices into one data frame per model. Then we save these data frames to an xlsx and use one sheet per model/df.
fits.dfs <- lapply(fits.ls,
function(x) {out <- dplyr::bind_rows(lapply(x,
function(y) data.frame(level = rownames(y),
value = y[,1]) ),
.id = "")
out}
)
library(openxlsx)
wb <- buildWorkbook(fits.dfs)
saveWorkbook(wb, "RandomEffects.xlsx")
Edit:
To keep only random effects of flowername and put all of them (from all fitted models) into one excel sheet, transform your output list (fit.ls) as follows. This replaces the last code block:
fits.df <- lapply(fits.ls,
function(x) {dplyr::bind_rows(x$flowername)})
fits.df <- dplyr::bind_cols(fits.df)
colnames(fits.df) <- names(fits.ls)
fits.df <- cbind(rownames(fits.df), fits.df) # flowernames as a column so it is visible in the xlsx
openxlsx::write.xlsx(fits.df, "RandomEffects.xlsx")

numpy arrays: building a 3d array by adding 2d slices one at a time

Looking for some help with numpy and building a 3d array from multiply 2d arrays. I want to make a loop, such that on every iteration I make a new 2d array and make it a new slice in an existing 3d array. Here's my code sample.
import numpy as np
import random
import array
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
b = a <-- save copy
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
a.shape
(10, 10) <-- verify it's 10x10
b.shape
(10, 10) <-- verify it's 10x10
b = np.array([b, a]) <-- convert two 2d matrix into one 3d matrix
b.shape
(2, 10, 10) <-- verify it's a 3d matrix with two planes
a = np.random.randint(0, 9, size=(10, 10)) <-- make new random 10x10 matrix
b = np.array([b, a]) <-- add new 2d plane to the 3d matrix
b.shape
(2,) <-- should be (3, 10, 10)
Can anyone see what I'm doing wrong?
When you combine two arrays by using np.array([...]), they have to be the same shape. If they aren't numpy treats them not as numpy arrays, but as dumb/blind objects. There should have been a warning when you ran the last b = np.array([b, a]):
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Instead, use np.stack
b = np.stack([*b, a])
*b basically expands the children of b, so the above is equivalent to b = np.stack([b[0], b[1], a])
Or you can use np.vstack (vertical stack):
b = np.vstack([b, a[None]])
a[None] basically wraps a in another array. a.shape == (10, 10), a[None].shape == (1, 10, 10)
Both of the above produce the following:
>>> b.shape
(3, 10, 10)
>>> b
array([[[3, 8, 0, 2, 8, 0, 0, 5, 7, 7],
[0, 5, 2, 8, 8, 2, 1, 4, 5, 8],
[3, 2, 2, 4, 1, 8, 2, 0, 7, 5],
[5, 6, 5, 0, 8, 7, 4, 0, 4, 6],
[6, 2, 3, 7, 4, 3, 6, 6, 4, 8],
[2, 5, 1, 7, 1, 3, 0, 6, 0, 5],
[3, 4, 0, 7, 3, 4, 5, 0, 7, 4],
[0, 7, 2, 8, 7, 7, 4, 3, 2, 6],
[4, 6, 2, 5, 5, 8, 5, 8, 0, 8],
[3, 4, 1, 0, 3, 7, 0, 6, 7, 3]],
[[4, 0, 6, 2, 4, 4, 7, 0, 7, 2],
[5, 8, 5, 8, 2, 8, 3, 7, 4, 6],
[2, 1, 2, 0, 4, 5, 6, 3, 0, 0],
[8, 7, 3, 0, 8, 8, 0, 4, 1, 4],
[0, 2, 5, 7, 5, 3, 0, 5, 1, 7],
[1, 5, 8, 0, 2, 6, 5, 0, 3, 2],
[4, 4, 4, 3, 3, 8, 6, 6, 5, 5],
[5, 3, 6, 8, 0, 3, 0, 8, 8, 3],
[4, 2, 6, 6, 6, 2, 0, 0, 6, 2],
[7, 3, 8, 0, 7, 1, 1, 8, 6, 2]],
[[6, 6, 1, 1, 6, 4, 6, 2, 6, 7],
[0, 5, 6, 7, 5, 0, 0, 5, 8, 2],
[6, 6, 1, 5, 2, 3, 2, 3, 3, 2],
[0, 3, 7, 6, 4, 5, 3, 1, 7, 2],
[7, 6, 3, 0, 1, 7, 8, 3, 8, 5],
[3, 1, 8, 6, 1, 5, 0, 8, 6, 1],
[1, 4, 8, 1, 7, 0, 1, 1, 5, 3],
[2, 1, 4, 8, 2, 3, 1, 6, 8, 7],
[8, 1, 1, 0, 6, 1, 0, 6, 1, 6],
[1, 8, 4, 7, 7, 5, 0, 3, 8, 6]]])

How to reverse multiple lists using the Wolfram-Language

I'm trying to find the best way to solve the question: "Use Range, Reverse and Join to create {3, 2, 1, 4, 3, 2, 1, 5, 4, 3, 2, 1}"
So basically the given lists are {1, 2, 3, 4, 5}, {1, 2, 3, 4}, {1, 2, 3}.
I could easily solve this question but wanted to know if there is a better way (more efficient) than what i've come up with.:
My Solutions:
In[136]:= Join[ Reverse[Range[3]], Reverse[Range[4]], Reverse[Range[5]] ]
In[141]:= Reverse[Join[ Range[5], Range[4], Range[3] ]]
given lists: {1, 2, 3, 4, 5}, {1, 2, 3, 4}, {1, 2, 3}, where you have to use the functions Range, Reverse and Join to create the expected output:
{3, 2, 1, 4, 3, 2, 1, 5, 4, 3, 2, 1}
My solution will not be efficient if there were to be 100 lists instead of three.
Thanks in advance for the help
RUNNING EACH ELEMENT OF A LIST THROUGH A FUNCTION:
listA = {}
Function[x, listA = Join[listA, x]] /# {Range[5], Range[4], Range[3]}
listB = Reverse[listA]
Clear[listA]
output:
result -> listB: {3, 2, 1, 4, 3, 2, 1, 5, 4, 3, 2, 1}
Range[#] & /* Reverse /# {3, 4, 5} // Flatten
{3, 2, 1, 4, 3, 2, 1, 5, 4, 3, 2, 1}
Update
Someone voted to delete my answer without providing a reason. Perhaps because it did not use Join. To address that
Range[#] & /* Reverse /# {3, 4, 5} // Apply[Join]

Looping through a collection and deleting things on the way

I want to go through a collection and find the first pair of matching elements, but my current approach is having trouble with the indexing going out of bounds all the time.
Here's a simplified MWE example:
function processstuff(stuff)
for pointer1 in 1:length(stuff)
for pointer2 in pointer1:length(stuff)
println("$(stuff)")
pointer1 == pointer2 && continue
if stuff[pointer1] == stuff[pointer2]
# items match, remove them
deleteat!(stuff, pointer1)
deleteat!(stuff, pointer2)
end
end
end
end
processstuff(collect(rand(1:5, 20)))
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
ERROR: LoadError: BoundsError: attempt to access 16-element Array{Int64,1} at index [17]
(Obviously this example is just comparing two numbers, the real comparison isn't.)
The idea of updating the collection of stuff by removing both elements that have been processed looks like it works, because I think Julia updates the iteration thing each time through. But only for a while...?
You can use the following approach (assuming you want to remove pairs):
function processstuff!(stuff)
pointer1 = 1
while pointer1 < length(stuff)
for pointer2 in pointer1+1:length(stuff)
if stuff[pointer1] == stuff[pointer2]
deleteat!(stuff, (pointer1, pointer2))
pointer1 -= 1 # correct pointer location as we later add 1 to it
break
end
end
pointer1 += 1
end
end
In your code there were several problems:
you called deleteat! twice, which could invalidate indexing
your inner loop tried to delete pointer1 several times
in outer loop I use while to dynamically track changing size of stuff

Why is Flash doing array operations wrongly.

Its was runnning fine and then it through me this error
1125 Error #: 117 index is beyond the scope of 115.
It doesn't list a row number but the function below is the only place where a long array is referred to.
The error means its trying to access between end of the vector array- It shouldn't be possible.
Relevant code parts (the rest-public functions and other functions not include all work fine).
public class Main extends Sprite
{
internal var oneoff:Boolean = true;
internal var kanaList:Vector.<String> = new <String>["あ/ア", "あ/ア", "え/え", "え/え", "い/イ", "い/イ", "お/オ", "お/オ", "う/ウ", "う/ウ", "う/ウ", "う/ウ", "か/カ", "か/カ", "け/ケ", "け/ケ", "き/キ", "き/キ", "く/ク", "く/ク", "こ/コ", "こ/コ", "さ/サ", "さ/サ", " し/シ", " し/シ", "す/ス", "す/ス", "そ/ソ", "そ/ソ", "す/ス", "す/ス", "た/タ", "た/タ", "て/テ", "て/テ", " ち/チ", " ち/チ", "と/ト", "と/ト", "つ/ツ", "つ/ツ", "ら/ラ", "ら/ラ", "れ/レ", "れ/レ", "り/リ", "り/リ", "ろ/ロ", "ろ/ロ", "る/ル", "る/ル", "だ/ダ", "で/デ", "じ/ジ", "ど/ド", "ず/ズ", "ざ/ザ", "ぜ/ゼ", "ぞ/ゾ", "な/ナ", "ね/ネ", "に/二", "の/ノ", "ぬ/ヌ", "じゃ/ジャ", "じゅ/ジュ", "じょ/ジョ", "ん/ン", "しゃ/シャ", "しゅ/シュ", "しょ/ショ", "や/ヤ", "ゆ/ユ", "よ/ヨ", "は/ハ", "ひ/ヒ", "ふ/フ", "へ/ヘ", "ほ/ホ", "ば/バ", "ば/バ", "ぶ/ブ", "ぶ/ブ", "び/ビ", "び/ビ", "ぼ/ボ", "ぼ/ボ", "べ/ベ", "べ/ベ", "ぱ/パ", "ぴ/ピ", "ぷ/プ", "ぺ/ペ", "ぽ/ポ", "ま/マ", "み/ミ", " む/ム", "め/メ", "も/モ", "を/ヲ", "みゃ/ミャ", "みゅ/ミャ", "みょ/ミョ", "きゃ/キャ", "きゅ/キュ", "きょ/キョ", "にゃ/ニャ", "にゅ/ニュ", "にょ/ニョ", "びゃ/びゃ", "びゅ/ビュ", "びょ/ビョ", "  ひゃ/ヒャ", "ひゅ/ヒュ", "ひょ/ヒョ", "ぴゃ/ピャ", "ぴゅ/ピュ", "ぴょ/ピョ", "っ/ッ", "っ/ッ"];
internal var valueList:Vector.<uint>= new <uint>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 10, 10, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 20, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1, 1];
// Lists of Kana that can be replaced in the replace mode and the substitute Kana and Values
internal var selectghostList:Vector.<String>=new<String>["ま/マ","む/ム","も/モ","か/カ","く/ク","こ/コ","な/ナ","ぬ/ヌ","の/ノ","ば/バ","ぶ/ブ","ぼ/ボ","は/ハ","ふ/フ","ほ/ホ","ぱ/パ","ぷ/プ","ぽ/ポ"];
internal var selectkanaList:Vector.<String>=new <String>["みゃ/ミャ", "みゅ/ミャ", "みょ/ミョ", "きゃ/キャ", "きゅ/キュ", "きょ/キョ", "にゃ/ニャ", "にゅ/ニュ", "にょ/ニョ", "びゃ/びゃ", "びゅ/ビュ", "びょ/ビョ", "  ひゃ/ヒャ", "ひゅ/ヒュ", "ひょ/ヒョ", "ぴゃ/ピャ", "ぴゅ/ピュ", "ぴょ/ピョ"];
internal var selectghostvalueList:Vector.<uint>=new <uint>[2, 2, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2];
//Start list of playerHand contents as I don't know if Null is 0
internal var playernumber:uint;
internal var allplayersHand:Array = [[0], [0], [0], [0],[0], [0]];
internal var playerRound:uint = 1;
internal var round:uint = 1;
internal var aplayersHand:Array;
internal function create():void
{ var listLength:uint;
var row:uint
listLength = kanaList.length;
aplayersHand = allplayersHand[playerRound];
for (var i:uint = (aplayersHand.length); i <= 7; i+=1)
{row = int(Math.random() * listLength);  
trace (row);
trace(i);
aplayersHand[i] = [0, kanaList[row], valueList[row],]
trace (aplayersHand);
trace (aplayersHand[i]);
kanaList.splice(row,1);
valueList.splice(row,1);
}
deal();
}
I'm assuming it's throwing the error intermittently. The reason I think it's happening is that you stored long array's length in listLength, but didn't decrement its value after
kanaList.splice(row,1);
valueList.splice(row,1);
which is why, I think, row value calculated like
row = int(Math.random() * listLength);
would sometimes return a value which is greater than array's length at that iteration.
On a sidenote, it'd be great to have what all was traced till the point you got the error. Also, the exception should show stack trace, if you compile a debug version of swf and run it in a debug flash player. The stack trace is very very useful to track down bugs like these.

Resources