Topic Modling: How to use the LDA in C for example data? - c

I want to try the LDA -c code by Blie .et.al. as it is in this link.
I have compiled the code, and when I run ./lda in my terminal, the following result is displayed.
usage : lda est [initial alpha] [k] [settings] [data] [random/seeded/manual=filename/*] [directory]
lda inf [settings] [model] [data] [name]
Which means that, it has been complied correctly.
However, in spite reading the README.txt file there, I am not being able to succesfully run the LDA code.
Either it says Segmentation fault (core dumped) or killed.
What am I missing? How to use it on the example data they have given?
I have read the stack overflow answer to the question asked here, but it was not useful as I dont know the default values.
P.S: I am a beginer.

Are you using ap.txt instead of ap.dat by any chance? lda-c doesn't take raw sentences or marked up data as input; it takes a sequence of bag of words information for each document. When ap.dat has a line like
186 0:1 6144:1 3586:2 ..., it means that the corresponding document has 186 distinct words, word 0 appears once, word 6144 appears once, word 3586 appears twice, and so on.
This command works for me (using Blei's original code):
./lda est 0.1 10 settings.txt ap.dat random modeldir
(Feel free to tweak the initial alpha (0.1) and number of topics (10) as you wish.)

Related

"? Out of data error in 60" on my C64 mini in basic

I have been coding "duel" from the book "Sixty Programmes for the Commodore 64" (by R. Erskine et al.), into my C64 mini in basic. I keep getting the following error: "? Out of data error in 60". I've checked the code for typos and can't find any. Has anyone else had this problem and do you have a fix? Thank you
I have checked the code for typos and I can't find any.
Lines 5-60:
5 REM *** D U E L *** # MICHAEL BEWS
*** TRANSLATED BY IAN YATES
10 V-53248:X=RND(-TI):POKEV+32,4:POKEV+33,5:POKEV+24,23:POKE650,255:M20
20 Y$="String of C64 Characters":X$="String of C64 Characters
30 PRINT"String of C64 CharactersPLEASE WAIT WHILE USER-DEFINED",,"CHARACTERS ARE SET UP."
40 POKE52,48:POKE56,48:POKE56334,PEEK(56334)AND254:POKE1,PEEK(1)AND251
50 FORX=14336TO15143:POKEX,PEEK(X+40960):NEXT:FORX=1TO30:READA:NEXT
60 FORX=15144To15247:READA:POKEX,A:NEXT:M$="String of C64 Characters":N$="String of C64 Characters"
DATA is a way of feeding a sequence of values into a BASIC program. The number of values in the DATA statements must be greater or equal to the number of times READ is called. If READ runs out of DATA values then it raises an "Out of Data" error.
In this case, there should be 133 values separated by commas or different DATA statements. However, the end of line 50 is somewhat odd. It reads 30 values into A without doing anything with them so that part is pointless.
Check your source for the code to see if there are any misprints or missing lines. If not, try commenting out that line 50 FOR statement.

difference yap and swi-prolog reading canonical lists

I have the following test code trying to read file into a list
open('raw250-split1.pl', read, Stream),
read(Stream,train_xs(TrainXs)),
length(TrainXs, MaxTrain).
I will omit part of the output due to the file is quite large.
It works well with yap,
➜ chill git:(master) ✗ yap [18/06/19| 5:48PM]
% Restoring file /usr/lib/Yap/startup.yss
YAP 6.2.2 (x86_64-linux): Sat Sep 17 13:59:03 UTC 2016
?- open('raw250-split1.pl', read, Stream),
read(Stream, train_xs(TrainXs)),
length(TrainXs, MaxTrain).
MaxTrain = 225,
Stream = '$stream'(3),
TrainXs = [[parse([which,rivers,run,through,states,bordering,new,mexico,/],answer(_A,(river(_A),traverse(_A,_B),next_to(_B,_C),const(_C,stateid('new mexico')))))],
<omited output>
,[parse([what,is,the,largest,state,capital,in,population,?],answer(_ST,largest(_SU,(capital(_ST),population(_ST,_SU)))))]]
But on swi-prolog, it will produce Type error
➜ chill git:(master) ✗ swipl [18/06/19| 7:24PM]
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- open('raw250-split1.pl', read, Stream),
read(Stream, train_xs(TrainXs)),
length(TrainXs, MaxTrain).
ERROR: raw250-split1.pl:4:
Type error: `list' expected, found `parse(which.(rivers.(run.(through.(states.(bordering.(new.(mexico.((/).[])))))))),
<omited output>
,answer(_67604,(state(_67604),next_to(_67604,_67628),const(_67628,stateid(kentucky))))).[].(parse(what.((is).(the.(largest.(state.(capital.(in.(population.((?).[])))))))),answer(_67714,largest(_67720,(capital(_67714),population(_67714,_67720))))).[].[]))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))' (a compound)
In:
[10] throw(error(type_error(list,...),context(...,_67800)))
[7] <user>
Note: some frames are missing due to last-call optimization.
Re-run your program in debug mode (:- debug.) to get more detail.
What might be the problem for the error here?
File raw250-split1.pl can be found from the ftp url below, if you'd like to try it.
Thank you for the help!
I am trying to migrate an earlier code to SWI-Prolog, which was written in
SICStus 3 #3: Thu Sep 12 09:54:27 CDT 1996 or earlier
by Raymond J. Mooney ftp://ftp.cs.utexas.edu/pub/mooney/chill/.
All the questions with this tag are all related to this task. I'm new to prolog, helps and suggestions are welcomed!
The raw250-split1.pl was apparently written using canonical notation. The traditional list functor is ./2 but SWI-Prolog 7.x changed it to '[|]'/2 in order to use ./2 for other purposes. This results in the the variable TrainXs being instantiated by the read/2 call to a compound term whose argument is not a list:
?- open('raw250-split1.pl', read, Stream), read(Stream,train_xs(TrainXs)).
Stream = <stream>(0x7f8975e08e90),
TrainXs = parse(which.(rivers.(run.(through.(states.(bordering.(... . ...)))))), answer(_94, (river(_94), traverse(_94, _100), next_to(_100, _106), const(_106, stateid('new mexico'))))).[].(parse(what.((is).(the.(highest.(point.(... . ...))))), answer(_206, (high_point(_204, _206), const(_204, stateid(montana))))).[].(parse(what.((is).(the.(most.(... . ...)))), answer(_298, largest(_300, (population(_298, _300), state(...), ..., ...)))).[].(parse(through.(which.(states.(... . ...))), answer(_414, (state(_414), const(..., ...), traverse(..., ...)))).[].(parse(what.((is).(... . ...)), answer(_500, longest(_500, river(...)))).[].(parse(how.(... . ...), answer(_566, (..., ...))).[].(parse(... . ..., answer(..., ...)).[].(parse(..., ...).[].(... . ... .(... . ...))))))))).
YAP still uses the ./2 functor for lists, which explains why it can handle it. A workaround for SWI-Prolog is to start it with the --traditional command-line option:
$ swipl --traditional
...
?- open('raw250-split1.pl', read, Stream), read(Stream,train_xs(TrainXs)).
Stream = <stream>(0x7faeb2f77700),
TrainXs = [[parse([which, rivers, run, through, states, bordering|...], answer(_94, (river(_94), traverse(_94, _100), next_to(_100, _106), const(_106, stateid('new mexico')))))], [parse([what, is, the, highest, point|...], answer(_206, (high_point(_204, _206), const(_204, stateid(montana)))))], [parse([what, is, the, most|...], answer(_298, largest(_300, (population(_298, _300), state(...), ..., ...))))], [parse([through, which, states|...], answer(_414, (state(_414), const(..., ...), traverse(..., ...))))], [parse([what, is|...], answer(_500, longest(_500, river(...))))], [parse([how|...], answer(_566, (..., ...)))], [parse([...|...], answer(..., ...))], [parse(..., ...)], [...]|...].
The type error you get is due to the length/2 expecting a list when the first argument is bound.
There is a tilde as last character in that file, causing the syntax being invalid, so you should remove it before reading. I don't know why YAP accept the file as valid, should raise an error AFAIK.
There is a read option dotlists/2 in SWI-Prolog:
dotlists(Bool)
If true (default false), read .(a,[]) as a
list, even if lists are internally nor constructed
using the dot as functor. This is primarily intended
to read the output from write_canonical/1 from
other Prolog systems. See section 5.1.
http://www.swi-prolog.org/pldoc/man?predicate=read_term/2
This gives you the desired result, without changing the mode:
Welcome to SWI-Prolog (threaded, 64 bits, version 8.1.0)
?- read_term(X, [dotlists(true)]).
|: .(a,.(b,.(c,[]))).
X = [a, b, c].

Emacs - GDB trace right to interrupt without stepping through all files

I am working on Pintos OS project. I get this message:
Page fault at 0xbfffefe0: not present error writing page in user context.
The problem with Pintos OS project is that it won't simply tell the line and method that caused the exception.
I know how to use breakpoints/watchpoints etc. but is there any way to step right to it without going through the WHOLE flow and ALL OS files line by line so that I could jump into line that caused exception and put breakpoint there? I looked at GDB commands but didn't find anything.
When I debug this project I have to step through the whole program until I find that error/exception which is very time consuming. There is probably a faster way to do this.
Thanks.
Whole trace:
nestilll#vdebian:~/Class/pintos/proj-3-bhling-nestilll-nsren/src/vm/build$ pintos -v -k -T 60 --qemu --gdb --filesys-size=2 -p tests/vm/pt-grow-pusha -a pt-grow-pusha --swap-size=4 -- -q -f run pt-grow-pusha
Use of literal control characters in variable names is deprecated at /home/nestilll/Class/pintos/src/utils/pintos line 909.
Prototype mismatch: sub main::SIGVTALRM () vs none at /home/nestilll/Class/pintos/src/utils/pintos line 933.
Constant subroutine SIGVTALRM redefined at /home/nestilll/Class/pintos/src/utils/pintos line 925.
warning: disabling timeout with --gdb
Copying tests/vm/pt-grow-pusha to scratch partition...
qemu -hda /tmp/N2JbACdqyV.dsk -m 4 -net none -nographic -s -S
PiLo hda1
Loading............
Kernel command line: -q -f extract run pt-grow-pusha
Pintos booting with 4,088 kB RAM...
382 pages available in kernel pool.
382 pages available in user pool.
Calibrating timer... 419,020,800 loops/s.
hda: 13,104 sectors (6 MB), model "QM00001", serial "QEMU HARDDISK"
hda1: 205 sectors (102 kB), Pintos OS kernel (20)
hda2: 4,096 sectors (2 MB), Pintos file system (21)
hda3: 98 sectors (49 kB), Pintos scratch (22)
hda4: 8,192 sectors (4 MB), Pintos swap (23)
filesys: using hda2
scratch: using hda3
swap: using hda4
Formatting file system...done.
Boot complete.
Extracting ustar archive from scratch device into file system...
Putting 'pt-grow-pusha' into the file system...
Erasing ustar archive...
Executing 'pt-grow-pusha':
(pt-grow-pusha) begin
Page fault at 0xbfffefe0: not present error writing page in user context.
pt-grow-pusha: dying due to interrupt 0x0e (#PF Page-Fault Exception).
Interrupt 0x0e (#PF Page-Fault Exception) at eip=0x804809c
cr2=bfffefe0 error=00000006
eax=bfffff8c ebx=00000000 ecx=0000000e edx=00000027
esi=00000000 edi=00000000 esp=bffff000 ebp=bfffffa8
cs=001b ds=0023 es=0023 ss=0023
pt-grow-pusha: exit(-1)
Execution of 'pt-grow-pusha' complete.
Timer: 71 ticks
Thread: 0 idle ticks, 63 kernel ticks, 8 user ticks
hda2 (filesys): 62 reads, 200 writes
hda3 (scratch): 97 reads, 2 writes
hda4 (swap): 0 reads, 0 writes
Console: 1359 characters output
Keyboard: 0 keys pressed
Exception: 1 page faults
Powering off...
to have the GDB debugger run and stop at the desired location:
gdb filename <--start debug session
br main <--set a breakpoint at the first line of the main() function
r <--run until that breakpoint is reached
br filename.c:linenumber <--set another breakpoint at the desired line of code
c <--continue until second breakpoint is encuntered
The debugger will stop at the desired location in the file, IF it ever actually gets there,
When I debug this project I have to step through the whole program
until I find what caused error/exception which is very time consuming.
There is probably a faster way to do this.
Normally what you would do is set a breakpoint just before the error. Then your program will run at full speed, without your intervention, until it reaches that point.
There are several wrinkles here.
First, sometimes it is difficult to know where to put the breakpoint. In this case I suppose I would look for the code that is printing the message, then work backward from there. Sometimes you have to stop at the failure point, examine the stack, set a new breakpoint further up, and re-run the program.
Then there is the mechanics of setting the breakpoint. One simple way is to break by function name, like break my_function. Another is to use the file name and line number, like break my_file.c:73.
Finally, sometimes a breakpoint can be hit many times before the failure is seen. You can use ignore counts (see help ignore) or conditional breakpoints (like break my_function if variable = 27) to limit the number of stops.

LPCXpresso error CreateProcess: No such file or directory

I know there are multiple questions regarding this subject, but they did not help.
When trying to compile, whatever, I keep getting the same error:
arm-none-eabi-gcc.exe: error: CreateProcess: No such file or directory
I guess it means that it can not find the compiler.
I have tried tweaking the path settings
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\nxp\LPCXpresso_7.6.2
326\lpcxpresso\tools\bin;
Seems to be right?
I have tried using Sysinternals process monitor
I can see that a lot of arm-none-eabi-gcc.exe are getting a result of name not found but there are a lot of successful results too.
I have also tried reinstalling the compiler and the LPCXpresso, no luck.
If i type arm-none-eabi-gcc -v i get the version, so it means its working
but when i am trying to compile in CMD like this arm-none-eabi-gcc led.c
i get the same error as stated above
arm-none-eabi-gcc.exe: error: CreateProcess: No such file or directory
Tried playing around more with PATH in enviroments, no luck. I feel like something is stopping LPCXpresso from finding the compiler
The only Antivirus this computer has is Avira and i disabled it. I also allowed compiler and LPCXpresso through the firewall
I have tried some more things, i will add it shortly after trying to duplicate the test.
It seems your problem is a happy mess with Vista and GCC. Long story short, a CRT function, access, has a different behavior on Windows and Linux. This difference is actually mentioned on Microsoft documentation, but the GCC folks didn't notice. This leads to a bug on Vista because this version of Windows is more strict on this point.
This bug is mentioned here : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33281
I have no proof that your problem comes from here, but the chances are good.
The solutions :
Do not use Vista
Recompile arm-none-eabi-gcc.exe with the flag -D__USE_MINGW_ACCESS
Patch arm-none-eabi-gcc.exe
The 3rd is the easiest, but it's a bit tricky. The goal is to hijack the access function and add an instruction to prevent undesired behavior. To patch your gcc, you have two solutions : you upload your .exe and I patch it for you, or I give you the instructions to patch it yourself. (I can also patch it for you, then give the instructions if it works). The patching isn't really hard and doesn't require advanced knowledge, but you must be rigorous.
As I said, I don't have this problem myself, so I don't know if my solution really works. The patch seems to be working for this problem.
EDIT2:
The exact problem is that the linux access have a parameter flag to check whether a file is executable. The Windows access cannot check for this. The behavior of most Windows versions is just to ignore this flag, and check if the file exists instead, which will usually give the same behavior. The problem is that Vista doesn't ignore this, and whenever the access is used to check for executability, it will return an error. This lead to GCC programs to think that some executables are not here. The patch induced by -D__USE_MINGW_ACCESS, or done manually, is to delete the flag when access is called, thus checking for existence instead just like other Windows versions.
EDIT:
The patching is actually needed for every GCC program that invokes other executables, and not only gcc.exe. So far there is only gcc.exe and collect2.exe.
Here are the patching instruction :
Backup your arm-none-eabi-gcc.exe.
Download and install CFF Explorer (direct link here).
Open arm-none-eabi-gcc.exe with CFF Explorer.
On the left panel, click on Import Directory.
In the module list that appears, click on the msvcrt.dll line.
In the import list that appears, find _access. Be careful here, the list is long, and there are multiple _access entries. The last one (the very last entry for me) is probably the good one.
When you click on the _access line, an address should appear on the list header, in the 2nd column 2nd row, just below FTs(IAT). Write down that address on notepad (for me, it is 00180948, it may be different). I will refer to this address as F.
On the left panel, click on Address Converter.
Three fields should appear, enter address F in the File Offset field.
Write down on notepad a 6 bytes value : the first two bytes are FF 25, the last 4 are the address that appeared in the VA field, IN REVERSE. For example, if 00586548 appeared in the VA field, write down FF 25 48 65 58 00 (spaces added for legibility). I will refer to this value as J. This value J is the instruction that jumps to the _access function.
On the left panel, click on Section Headers.
In the section list that appeared on the right, click on the .text line (the .text section is where the code resides).
In the editor panel that appeared below, click on the magnifier and, in the Hex search bar, search for a series of 11 90 (9090909090..., 90 is NOP in assembly). This is to find a code cave (unused space) to insert the patch, which is 11 bytes long. Once you found a code cave, write down the offset of the first 90. The exact offset is displayed on the very bottom as Pos : xxxxxxxx. I will refer to this offset as C.
Use the editor to change the sequence of 11 90 : the first 5 bytes are 80 64 E4 08 06. These 5 bytes are the instruction that prevents the wrong behavior. The last 6 bytes are the value J (edit the next 6 bytes to J, ex. FF 25 48 65 58 00), to jump back to the _access function.
Click on the arrow icon (Go To Offset) a bit below, and enter 0, to navigate to the beginning of the file.
Use the Hex search bar again to search for value J. If you find the bytes you just modified, skip. The J value you need is located around many value containing FF 25 and 90 90. That is the DLL jump table. Write down the offset of the value J you found (offset of the first byte, FF). I will refer to this offset as S. Note 1: If you can't find the value, maybe you picked the wrong _access in step 6, or did something wrong between step 6 to 10. Note 2: The search bar doesn't loop when it hit the end; go to offset 0 manually to re-find.
Use a hexadecimal 32-bit 2-complement calculator (like this one : calc.penjee.com) to calculate C - S - 5. If your offset C is 8C0 and your offset S is 6D810, you must obtain FF F9 30 AB (8C0 minus 6D810, minus 5).
Replace the value J you found in the file (at step 16) by 5 bytes : the first byte is E9, the last 4 are the result of the last operation, IN REVERSE. If you obtained FF F9 30 AB, you must replace the value J (ex: FF 25 48 65 58 00) by E9 AB 30 F9 FF. The 6th byte of J can be left untouched. These 5 bytes are the jump to the patch.
File -> Save
Notes : You should have modified a total of 16 bytes. If the patched program crash, you did something wrong. Even if it doesn't work, this patch can't induce a crash.
Let me know if you have difficulties somewhere.

How can I run this DTrace script to profile my application?

I was searching online for something to help me do assembly line profiling. I searched and found something on http://www.webservertalk.com/message897404.html
There are two parts of to this problem; finding all instructions of a particular type (inc, add, shl, etc) to determine groupings and then figuring out which are getting executed and summing correcty. The first bit is tricky unless grouping by disassembler is sufficient. For figuring which instructions are being executed, Dtrace is of course your friend here( at least in userland).
The nicest way of doing this would be instrument only the begining of each basic block; finding these would be a manual process right now... however, instrumenting each instruction is feasible for small applications. Here's an example:
First, our quite trivial C program under test:
main()
{
int i;
for (i = 0; i < 100; i++)
getpid();
}
Now, our slightly tricky D script:
#pragma D option quiet
pid$target:a.out::entry
/address[probefunc] == 0/
{
address[probefunc]=uregs[R_PC];
}
pid$target:a.out::
/address[probefunc] != 0/
{
#a[probefunc,(uregs[R_PC]-address[probefunc]), uregs[R_PC]]=count();
}
END
{
printa("%s+%#x:\t%d\t%#d\n", #a);
}
main+0x1: 1
main+0x3: 1
main+0x6: 1
main+0x9: 1
main+0xe: 1
main+0x11: 1
main+0x14: 1
main+0x17: 1
main+0x1a: 1
main+0x1c: 1
main+0x23: 101
main+0x27: 101
main+0x29: 100
main+0x2e: 100
main+0x31: 100
main+0x33: 100
main+0x35: 1
main+0x36: 1
main+0x37: 1
From the example given, this is exactly what i need. However I have no idea what it is doing, how to save the DTrace program, how to execute with the code that i want to get the results of. So i opened this hoping some people with good DTrace background could help me understand the code, save it, run it and hopefully get the results shown.
If all you want to do is run this particular DTrace script, simply save it to a .d script file and use a command like the following to run it against your compiled executable:
sudo dtrace -s dtracescript.d -c [Path to executable]
where you replace dtracescript.d with your script file name.
This assumes that you have DTrace as part of your system (I'm running Mac OS X, which has had it since Leopard).
If you're curious about how this works, I wrote a two-part tutorial on using DTrace for MacResearch a while ago, which can be found here and here.

Resources