SimpleTagger based on CRF with mallet - tagging

Please, I want to run the class Simple Tagger in mallet. I work with eclipse. I only need to know the order of args to give in input.
This link explained each argument but not the order (args[0], args1, etc.)
In addition, do you have an idea about the running time of this class please?

There is no order. The only thing you have to be careful with is supplying the correct arguments with the correct options. As an example:
--orders 2 --random-seed 1
would be an array of 4 command line arguments and either orders or random-seed can come first, it doesn't matter, but certainly the random seed is 1 and the list of Markov orders contains one value: 2.

Related

Looping and selectig different value in each line

I have a problem that I think would be solved relatively quickly with a loop. I have to work with SPSS and I think it can only be solved in syntax.
Unfortunately I am not good with loops, so I hope that one of you can help me.
I have done a study on reasons for abortions. Now I would like to present the distribution of reasons.
The problem is that each person was first asked about all their pregnancies (because this is also relevant for the later analysis), then the pregnancy was determined to which the questionnaire will further refer.
So the further questionnaire was only about one of the pregnancies, whereas the first questions (f.ex. year of pregnancy, reason for abortion) were answered for each pregnancy. For the reasons I only need the information that refers to the pregnancy that was also used for the further questionnaire.
I have an index variable that determines the loop at which pass the relevant pregnancy is asked ("index"). Then I have the variable "Loop_1_R" to "Loop_5_R" which queries the reasons for each up to 5 abortions (of course, for each woman, only the number of pregnancies that she also indicated). In between there are some missing data, for ex. it could be that a woman said that she had 5 pregnancies, but only two of them were abortions (f.ex. the third and fifth). So then she would only give reasons for an abortion in loop3 and loop5.
Now I want to create a new variable which contains only the reason which refers to the relevant pregnancy. So for each woman only one value. I was thinking, you could build a loop in the sense of calculate new variable in such a way that loop i is taken at index i.
I could of course do it by hand, but with a VPN count of over 3000 it will obviously take considerably longer.
I hope someone can help me! This is an example dataset with less loops and VPN:
You can use do repeat to loop and catch the value you need this way:
do repeat vr=Loop_1_R to Loop_5_R/vl=1 to 5.
if Index=vl reason=vr.
end repeat.

Ive got a pipe that consists of 5 pieces, each including 5 properties

Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.
If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)

Finding Shortest Path function in Neo4j applied to real life problems

Questions
1.) For the “General Computing” pathway, which module has highest impact
(i.e., is the compulsory pre-requisite of the most modules)?
2.) If a student fails a particular module in first year, display pathways that
will take minimum 4 years to complete the course (Note: all modules on
the pathway need to be done to complete a course).
Please help
This may work, if I understand your data model:
MATCH (m:Module)<-[r:PRE_REQUISITE]-(:Module)-[:ON]->(pw:Pathway)
WHERE pw.title = 'General Computing' AND r.type = 'Compulsory'
RETURN m, COUNT(*) AS impact
ORDER by impact DESC
LIMIT 1
You have not provided enough information. We do not know how long each module takes to complete and how often it is offered, whether the level property needs to be considered and how, what other type values there are and what they all really mean, etc. And it seems that one would really want a maximum of 4 years, not a minimum.
I think the answer for the first question should be like this:
match (module)<-[r:PRE_REQUISITE{type: "Compulsory"}]-(m:Module)-[:ON]->(p:Pathway{title: "General Computing"})
return module.title as ModuleName, count(*) as highest_impact
order by highest_impact desc
limit 1

How to repeat a command on different values of the same variable using SPSS LOOP?

Probably an easy question:
I want to run this piece of syntax:
SUMMARIZE
/TABLES=AGENCY
PIN
AGE
GENDER
DISABILITY
MAINSERVICE
MRESAGENCY
MRESSUPPORT
/FORMAT=LIST NOCASENUM TOTAL
/TITLE='Case Summaries'
/MISSING=VARIABLE
/CELLS=COUNT.
for 264 different agencies which are all values contained in the variable 'AGENCY'.
I want to create a different table for each agency outlining the above information for them.
I think I can do this using a DO REPEAT or LOOP on SPSS.
Any advice would be much appreciated.
Thank you :)
note: I have Google'd and read endless amounts on looping I am just a little unsure as to which method is what I am looking for
Take a look at SPLIT FILE, which meets your needs

How to isolate numbers separated by stars in Lua?

In some web service, I receive this
"time":"0.301*0.869*1.387*2.93*3.653*3.956*4.344*6.268*6.805*7.712*9.099*9.784*11.071*11.921*13.347*14.253*14.965*16.313*16.563*17.426*17.62*18.114"
I want to separate the numbers and insert them into a table like this, how ?
0.301
0.869
1.387
2.93
3.653
3.956
4.344
6.268
6.805
7.712
9.099
9.784
11.071
11.921
13.347
14.253
14.965
16.313
16.563
17.426
17.62
18.114
A little string-matching should get the job done:
local str = [["time":"0.301*0.869*1.387*2.93*3.653*3.956*4.344*6.268*6.805*7.712*9.099*9.784*11.071*11.921*13.347*14.253*14.965*16.313*16.563*17.426*17.62*18.114"]]
local list = {}
for num in str:gmatch("%**(%d+%.%d+)") do
table.insert(list, tonumber(num))
end
A Little Explanation
I'll first briefly summarize what some of the symbols here are:
%d this means to look for a specific digit.
%. means to look specifically for a period
+ means to look for 1 or more of the specific thing you wanted to match earlier.
%* means to look specifically for a star.
* when the percentage sign isn't in front, this means that you can match 0 or more of a specific match.
Now, let's put this together to look at it from the start:
%** This means that we want the string to start with a star, but it that is optional. The reason we need it to be optional is because the first number you wanted does not have a star in front of it.
%d+ means to look for a sequence of digit(s) until something else pops up. In our case, this would be like the '18' in '18.114' or the '1' in '1.387'
%. as I said means we want the next thing found to be a period.
%d+ means we want another sequence of digit(s). Such as the 114 in 18.114
So, what do the parenthesis mean? It just means that we don't care about anything else outside the parenthesis when we capture the pattern.

Resources