I downloaded a data set which is supposed to be in RDF format http://iw.rpi.edu/wiki/Dataset_1329, using Notepad++ I opened it but can't read it. Any suggestions?
The file, uncompressed, is about 140MB. Notepad++ is probably failing due to the size of the file. The RDF format used in this dataset is Ntriples, one triple per line with three components (subject, predicate, object), very human readable. Sample data from the file:
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_other_multi_racial> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_black_and_white> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/national_origin_hispanic> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/filed_cases> "1" .
If you want to have a look at the data then try to open it with a tool that streams the file rather than loading it all at once, for instance less or head.
If you want to use the data you might want to look into loading it in a triple store (4store, Virtuoso, Jena TDB, ...) and use SPARQL to query it.
Try Google Refine (possibly with RDF extension: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ )
Related
I'm trying to load pre-trained word embeddings for the Arabic language (Mazajak embeddings: http://mazajak.inf.ed.ac.uk:8000/). The embeddings file does not have a particular extension and I'm struggling to get it to load. What's the usual process to load these embeddings?
I've tried doing with open("get_sg250", encoding = encoding) as file: file.readlines() for different encodings but it seems like none of them are the answer (utf-8 does not work at all), if I try windows-1256 I get gibberish:
e.g.
['8917028 300\n',
'</s> Hل®:0\x16ء:؟X§؛R8ڈ؛\xa0سî9K\u200fƒ::m¤9¼»“8¤p\u200c؛tعA:UU¾؛“_ع9‚Nƒ¹®G§¹قفگ؛ww$؛\u200eba:\x14.„:R¸پ:0–\x0b:–ü\x06:×#¦؛Yٍ²؛m ظ:{\x14¦:µ\x01‡:ه\x17S¹Yr¯:j\x03-¹ff€9×£P¸\n',
'W‚؛UUه9¼»é¹""§؛\u200c¶د:UU؟:\u200eb؟¹{\x14\u200d¸,ù19ïî\u200d؛ئ\x12¯؛\x00\x00ا:\u200c6°7A§a؛ذé„؛ذi†؛®G\x14:حجŒ8\x03\u200cè9ه\x17¸؛ق]¦؛ڈآ5¸قفا9حج^:\x00€ٹ؛q=²:\x00\x00¢9\x14®أ9×£T¹لz‚:\x1bèG؛®G7؛ڑ™<:m\xa0ƒ¹""´9\x14®\x1d:"¢²؛®G-؛ڑ™~:±ن¸:\x18ث«:¸\x1e…؛`,8؛Hل\u200d¹±ن.:\x1f…¥؛لْ‚:ڑ™s:R¸\x0b؛ئ’\x07؛0–C؛ڈآ¸:ذéھ:ة/خ¹A\'¸:ڑ™ز:m\xa0\x1e:è´ظ::ي‡؛\n',
'×\x05؛Œ%8؛ش\x06~؛أُu:\x00\x00\n',
":‰ˆ\x149\x14®?؛ِ(\x05:«ھ…:)\\‡833G:Haط؛\x1f…¼:¼»'9\x00\x00 ؛=\n",
'6؛R¸‚¹¼;€؛\x1bè¾؛\x1bèw؛قف؛:A§\x1a؛""j؛K~J:Hل\x14؛ىرد:\u200c6\x0c؛–|ب؛‚Nm:cةد·:mک؛‰ˆھ9\x00\x00ü9DD(¹ذi\x1f:ذé¬؛,ù™9¼»\x1e:wwƒ؛\x03\u200cF87ذ©·×£Q؛\x1f…w؛ئ\x12ح؛\x00\x00\x007ٍ‹U8\x0etZ6“ك«؛cةط؛Haد؛–ü¼؛33?¹Œ%َ9أُخ9=\n',
'‹؛ق]ع:ڈآ/؛0–ق¹¤pُ¹Dؤخ:¤p¤؛\x1bèت9\u200ebé¹ùE‹:–üb7=ٹ؛:؟Xv؛×£c؛ِ(·؛è4\xa0؛cة‹؛0\x16ˆ؛ئ’U:""#؛ة/j:R8،:أُى9ذé€:ىQX:\x1f…L:""›؛K\u200f•؛ڈآں؛‰ˆ8¸ww´:""o؛è´…؛\n',
'W·؛¤pگ:{”¶؛\x0etJ¹\u200eb>:ùإة؛`¬أ؛ِ(ü9K\u200f™:‚N؛:لz;:ِ(ٹ:Œ¥ˆ؛§\n',
'ں؛ِ¨\xad:ڑ™q؛\u200c6\x19:×£H9¤p\x1c:\x03\u200cخ¹–üٹ8UU\x13؛Hلؤ¹è´ء؛ïnژ؛®Gک:è´¯9\x0etN؛O\x1b\x0b؛\x00\x00Z:\n',
'Wڑ؛""J؛؟طخ:\x03\u200c¹:لْ¬؛\u200c6ک9ڑ™D؛\x1bèT8ق]ƒ:¼»س:0–-:~±³:,y‰:è´،¸jƒأ:m\xa0]:A\'د:j\x03\x15؛Haد:""½:wwù¹ه\x17ء؛×#س:&؟œ9×£5؛Hلz¹\\ڈ€¹)\\¨؛O\x1bْ¹ه\x17\x1b¹ڈB×؛\x03\u200c™؛ىQز¹لz¤¹ذi\x1c:\\ڈژ9ùإV¹R¸€:ùإü9ww?9‰\x08\u200d:~±ؤ¹‚Nù¹‰ˆ\x10¹UUn؛\x11\x11ƒ؛ٍ‹چ8‰ˆ½:\x1bèî¹O\x1bè¶`¬´؛=\n',
'¢:\n',
I've also tried using pickle but that also doesn't work.
Any suggestions on what I could try out?
I need to extract all the data I can find from these parts catalogs. Each catalog is in its own folder that looks like this.
Here is the PEid from the application
Here is how that data looks like when its viewed in the software.
Here is a list of all the files in the application directory
ASYCFILT.DLL
BIN
BTRDP32.ocx
btrv.reg
catalog.dll
COMCAT.DLL
COMCT332.OCX
COMDLG32.OCX
DAO350.DLL
dao360.dll
DBADAPT.DLL
DBGRID32.OCX
DBLIST32.OCX
expsrv.dll
Findfile.avi
IMGADMIN.OCX
IMGCMN.DLL
IMGEDIT.OCX
IMGSCAN.OCX
IMGSHL.DLL
IMGTHUMB.OCX
jdX.exe
MFC42.DLL
msado21.tlb
MSADODC.OCX
MSBIND.DLL
MSCOMCT2.OCX
MSCOMCTL.OCX
MSCOMM32.OCX
MSDATGRD.OCX
MSDATLST.OCX
MSDERUN.DLL
MSFLXGRD.OCX
MSHFLXGD.OCX
MSJET35.DLL
msjet40.dll
MSJINT35.DLL
msjint40.dll
MSJTER35.DLL
msjter40.dll
msjtes40.dll
MSMAPI32.OCX
MSMASK32.OCX
MSRD2X35.DLL
msrd2x40.dll
msrd3x40.dll
MSREPL35.DLL
msrepl40.dll
MSSTDFMT.DLL
MSVBVM50.DLL
MSVBVM60.DLL
MSVCIRT.DLL
msvcrt.dll
MSVCRT40.DLL
mswdat10.dll
mswstr10.dll
OIADM400.DLL
OICOM400.DLL
OIDIS400.DLL
OIFIL400.DLL
OIGFS400.DLL
OIPRT400.DLL
OISLB400.DLL
OISSQ400.DLL
OITWA400.DLL
OIUI400.DLL
OLEAUT32.DLL
oleDB.dll
OLEPRO32.DLL
path.dat
prcopts.dat
print.txt
recall.dat
reg.bat
REGICON.OCX
RICHED32.DLL
RICHTX32.OCX
scrrun.dll
SIMPDATA.TLB
STDOLE2.TLB
SYSINFO.OCX
sysprint.dat
TABCTL32.OCX
Uninstall.exe
unreg.bat
VB5DB.DLL
VB6STKIT.DLL
VBAJET32.DLL
vsflex8l.ocx
W32mkde.exe
W32mkrc.dll
Wbtrv32.dll
WBTRVC32.DLL
Here is a listing one of the PARTLIST.DAT files from XORStrings
How can i extract the data from the files?
I've been looking at reverse engineering but not sure which direction to go with that. Do I have to start at the application (exe) level to get the data extracted? Can I not find a way to extract directly from the dat file itself?
I used IDA to upend the dat file and I can see the data that matches the application. But I cannot extract that data without pulling all the raw data with it.
I have a file containing triple RDF (subject-predicate-object) in the turtle syntax (.ttl), and I have another file in which I only have some subjects.
For example:
<http://dbpedia.org/resource/AlbaniaHistory> <http://www.w3.org/2000/01/rdf-schema#label> "AlbaniaHistory"#en .
<http://dbpedia.org/resource/AsWeMayThink> <http://www.w3.org/2000/01/rdf-schema#label> "AsWeMayThink"#en .
<http://dbpedia.org/resource/AlbaniaEconomy> <http://www.w3.org/2000/01/rdf-schema#label> "AlbaniaEconomy"#en .
<http://dbpedia.org/resource/AlbaniaGovernment> <http://www.w3.org/2000/01/rdf-schema#label> "AlbaniaGovernment"#en .
And in the other file I have, for example:
<http://dbpedia.org/resource/AlbaniaHistory>
<http://dbpedia.org/resource/AlbaniaGovernment>
<http://dbpedia.org/resource/Pérotin>
<http://dbpedia.org/resource/ArtificalLanguages>
I would like to get:
<http://dbpedia.org/resource/AlbaniaHistory> <http://www.w3.org/2000/01/rdf-schema#label> "AlbaniaHistory"#en .
<http://dbpedia.org/resource/AlbaniaGovernment> <http://www.w3.org/2000/01/rdf-schema#label> "AlbaniaGovernment"#en .
So, I would like to remove from the first file the triples whose subjects are not in the second file. How could I get this?
I tried in java reading the contents of the second file in an arraylist and using the "contain" method to check if the subjects of each triple of the first file match any line in the second file, however it is too slow since the files are very big. How could I get this?
Thank you very much for helping
In Java, you could use an RDF library to read/write in streaming fashion and do some basic filtering.
For example, using RDF4J's Rio parser you could create a simple SubjectFilter class that checks for any triple if it has the required subject:
public class SubjectFilter extends RDFHandlerWrapper {
#Override
public void handleStatement(Statement st) throws RDFHandlerException {
// only write the statement if it has a subject we want
if (myListOfSubjects.contains(statement.getSubject()) {
super.handleStatement(st);
}
}
}
And then connect a parser to a writer that spits out the filtered content, something along these lines:
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);
RDFWriter rdfWriter = Rio.createWriter(RDFFormat.TURTLE,
new FileOutputStream("/path/to/example-output.ttl"));
// link our parser to our writer, wrapping the writer in our subject filter
rdfParser.setRDFHandler(new SubjectFilter(rdfWriter));
// start processing
rdfParser.parse(new FileInputStream("/path/to/input-file.ttl"), "");
For more details on how to use RDF4J and the Rio parsers, see the documentation.
As an aside: although this is perhaps more work than doing some command line magic with things like grep and awk, the advantage is that this is semantically robust: you leave interpretation of which bit of your data is the triple's subject to a processor that understands RDF, rather than taking an educated guess through regex ("it's probably the first URL on each line"), which may break in cases where the input file use a slightly different syntax variation.
(disclosure: I am on the RDF4J development team)
I am new to Oracle Apex 5.1, and I have been asked to implement a button that when clicked, the user gets (downloads) a .doc file of an Interactive report.
I have noticed that the Interactive Report gives you the option to download it as .pdf, .xls, and so, but I need it to be a Word (.doc) file.
In addition, the file must be in a specific format (with heading, indentation, font, etc.) that I was given (as a template) in a Word file.
Any help would be appreciated.
Additional Information: I was able to open the template (.doc) file in NotePad++ and get the <html> version of it, so I could edit it in both NotePad++ and Word.
One of the best actually to do that is APEX OFFICE PRINT(AOP) but isn't free licence.
otherwise you can check this solution
How do we export a ms-word (or rtf) document (from a web browser) to generated by pl/sql?
I end up finding information in this page: http://davidsgale.com/apex-how-to-download-a-file/ and I wrote this code:
declare
l_clob clob;
begin
l_clob := null;
sys.htp.init;
sys.owa_util.mime_header('application/vnd.ms-word', FALSE,'utf-8');
sys.htp.p('Content-length: ' || sys.dbms_lob.getlength( l_clob ));
sys.htp.p('Content-Disposition: inline; filename="test_file.doc"' );
sys.owa_util.http_header_close;
sys.htp.p(SET_DOC_HEADER);
sys.htp.p(SET_TABLE_HEADER);
sys.htp.p(ADD_TABLE_ENTRY("arguments"));
sys.htp.p(SET_TABLE_FOOTER);
sys.wpg_docload.download_file(l_clob);
apex_application.stop_apex_engine;
exception when others then
--sys.htp.prn('error: '||sqlerrm);
apex_application.stop_apex_engine;
end;
It works, but I had to create functions in the SQL Workshop because writting a table in html is really long.
I'm currently trying to attach image files to a model directly from a zip file (i.e. without first saving them on a disk). It seems like there should be a clearer way of converting a ZipEntry to a Tempfile or File that can be stored in memory to be passed to another method or object that knows what to do with it.
Here's my code:
def extract (file = nil)
Zip::ZipFile.open(file) { |zip_file|
zip_file.each { |image|
photo = self.photos.build
# photo.image = image # this doesn't work
# photo.image = File.open image # also doesn't work
# photo.image = File.new image.filename
photo.save
}
}
end
But the problem is that photo.image is an attachment (via paperclip) to the model, and assigning something as an attachment requires that something to be a File object. However, I cannot for the life of me figure out how to convert a ZipEntry to a File. The only way I've seen of opening or creating a File is to use a string to its path - meaning I have to extract the file to a location. Really, that just seems silly. Why can't I just extract the ZipEntry file to the output stream and convert it to a File there?
So the ultimate question: Can I extract a ZipEntry from a Zip file and turn it directly into a File object (or attach it directly as a Paperclip object)? Or am I stuck actually storing it on the hard drive before I can attach it, even though that version will be deleted in the end?
UPDATE
Thanks to blueberry fields, I think I'm a little closer to my solution. Here's the line of code that I added, and it gives me the Tempfile/File that I need:
photo.image = zip_file.get_output_stream image
However, my Photo object won't accept the file that's getting passed, since it's not an image/jpeg. In fact, checking the content_type of the file shows application/x-empty. I think this may be because getting the output stream seems to append a timestamp to the end of the file, so that it ends up looking like imagename.jpg20110203-20203-hukq0n. Edit: Also, the tempfile that it creates doesn't contain any data and is of size 0. So it's looking like this might not be the answer.
So, next question: does anyone know how to get this to give me an image/jpeg file?
UPDATE:
I've been playing around with this some more. It seems output stream is not the way to go, but rather an input stream (which is which has always kind of confused me). Using get_input_stream on the ZipEntry, I get the binary data in the file. I think now I just need to figure out how to get this into a Paperclip attachment (as a File object). I've tried pushing the ZipInputStream directly to the attachment, but of course, that doesn't work. I really find it hard to believe that no one has tried to cast an extracted ZipEntry as a File. Is there some reason that this would be considered bad programming practice? It seems to me like skipping the disk write for a temp file would be perfectly acceptable and supported in something like Zip archive management.
Anyway, the question still stands:
Is there a way of converting an Input Stream to a File object (or Tempfile)? Preferably without having to write to a disk.
Try this
Zip::ZipFile.open(params[:avatar].path) do |zipfile|
zipfile.each do |entry|
filename = entry.name
basename = File.basename(filename)
tempfile = Tempfile.new(basename)
tempfile.binmode
tempfile.write entry.get_input_stream.read
user = User.new
user.avatar = {
:tempfile => tempfile,
:filename => filename
}
user.save
end
end
Check out the get_input_stream and get_output_stream messages on ZipFile.