I have a file ontobible.owl. how to extract that file and then save data to mysql (because I want display data from ontobible.owl in website). can anyone help me?
edited:
here is my ontobible.owl file (https://teamtrainit.com/ontobible.owl)
i've try open ontobible.owl with sublime text 3 and contains like this
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#HOS5_2">
<verseID>HOS5_2</verseID>
<verse_text>And the revolters are profound to make slaughter, though I have been a rebuker of them all.</verse_text>
</Verse>
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#2CH2_1">
<hasPerson rdf:resource="http://semanticbible.org/ns/2006/NTNames#god_1324"/>
<hasPerson rdf:resource="http://www.co-ode.org/roberts/family-tree.owl#solomon_2762"/>
<verseID>2CH2_1</verseID>
<verse_text>And Solomon determined to build an house for the name of the LORD, and an house for his kingdom.</verse_text>
</Verse>
how to convert that xml tag to array or json so I cant save it to mysql database
you have several options for extracting data from owl
use owl-api and write java code (i think owl api is accessible in other languages) to extract data and pack it in the format you need. also you can use sparql queries for extracting data via jena api
install protege, open your file in protege and save it in format json-dl. this format is very similar to the regular json and you can easily transform it for your needs
install fuseki server, add your file and using sparql queries extract data from there
i think that the second option is the easiest for start if you don't want to write queries or code and it won't take long
Related
I could really use some help in how to extract data from Word documents using SSIS and inserting the extracted data in SQL. There are 10,000 - 13,000 Word files to process. The files most likely aren't consistent over the years. Any help is greatly appreciated!
Below is the example data from the Word documents that I'm interested in capturing. Note that Date and Job No are in the Header section.
Customer : Test Customer
Customer Ref. : 123456
Contact : Test Contact
Part No. : 123456789ABCDEFG
Manufacturer : Some Mfg.
Package : 123-456
Date Codes : 1234
Lot Number : 123456
Country of Origin : Country
Total Incoming Qty : 1 pc
XRF Test Result : PASS
HCT Result : PASS
Solder Test Result : PASS
My approach would be this:
Create a script in Python that extracts your data from the Word files and save them in XML or JSON format
Create SSIS package to load the data from each XML/JSON file to SQL Server
1. Using a script component as a source
To import data from Microsoft Word into SQL Server, you can use a script component as a data source where you can implement a C# script to parse document files using Office Interoperability libraries or any third-party assembly.
Example of reading tables from a Word file
2. Extracting XML from DOCX file
DOCX file is composed of several embedded files. Text is mainly stored within an XML file. You can use a script task or Execute Process Task to extract the DOCX file content and use an XML source to read the data.
How can I extract the data from a corrupted .docx file?
How to extract just plain text from .doc & .docx files?
3. Converting the Word document into a text file
The third approach is to convert the Word document into a text file and use a flat-file connection manager to read the data.
convert a word doc to text doc using C#
Converting a Microsoft Word document to a text file in C#
I'm using SNAP dataset for social network analysis. SNAP uses simple edge list as a data format. How to read SNAP dataset in Apache Giraph?
As per I know SNAP has various data formats depending upon which dataset you are looking at. If the dataset that you are looking at has the format : sourceid destinationid on each line then you might want to use IntNullTextEdgeInputFormat (it's in giraph-core/src/main/java/org/apache/giraph/io/formats ).
Also take a look at various predefined formats available in the same folder. If none of those fit for your dataset format then you can write your own input format class (it will be really simple if you start from the predefined formats and edit it as you need).
use -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat
Yes, SNAP uses Simple Edge List format for representing graph databases. You can use this code for converting it to a JSON format which is accepted by Apache Giraph.
I'm trying to find an effective way of saving the result of my Spark Job as a csv file. I'm using Spark with Hadoop and so far all my files are saved as part-00000.
Any ideas how to make my spark saving to file with a specified file name?
Since Spark uses Hadoop File System API to write data to files, this is sort of inevitable. If you do
rdd.saveAsTextFile("foo")
It will be saved as "foo/part-XXXXX" with one part-* file every partition in the RDD you are trying to save. The reason each partition in the RDD is written a separate file is for fault-tolerance. If the task writing 3rd partition (i.e. to part-00002) fails, Spark simply re-run the task and overwrite the partially written/corrupted part-00002, with no effect on other parts. If they all wrote to the same file, then it is much harder recover a single task for failures.
The part-XXXXX files are usually not a problem if you are going to consume it again in Spark / Hadoop-based frameworks because since they all use HDFS API, if you ask them to read "foo", they will all read all the part-XXXXX files inside foo as well.
I'll suggest to do it in this way (Java example):
theRddToPrint.coalesce(1, true).saveAsTextFile(textFileName);
FileSystem fs = anyUtilClass.getHadoopFileSystem(rootFolder);
FileUtil.copyMerge(
fs, new Path(textFileName),
fs, new Path(textFileNameDestiny),
true, fs.getConf(), null);
Extending Tathagata Das answer to Spark 2.x and Scala 2.11
Using Spark SQL we can do this in one liner
//implicits for magic functions like .toDf
import spark.implicits._
val df = Seq(
("first", 2.0),
("choose", 7.0),
("test", 1.5)
).toDF("name", "vals")
//write DataFrame/DataSet to external storage
df.write
.format("csv")
.save("csv/file/location")
Then you can go head and proceed with adoalonso's answer.
I have an idea, but not ready code snippet. Internally (as name suggest) Spark uses Hadoop output format. (as well as InputFormat when reading from HDFS).
In the hadoop's FileOutputFormat there is protected member setOutputFormat, which you can call from the inherited class to set other base name.
It's not really a clean solution, but inside a foreachRDD() you can basically do whatever you like, also create a new file.
In my solution this is what I do: I save the output on HDFS (for fault tolerance reasons), and inside a foreachRDD I also create a TSV file with statistics in a local folder.
I think you could probably do the same if that's what you need.
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html#output-operations
I have given some data in an excel sheet to a 3rd party for SPSS data processing. After completion of the processing, what are the files that I should get back from them.
I have received one file with a ".sav" extension. I presume this file contains the imported data (from my excel file).
I have received documents (.rtf - rich text format) with the chart and graphs only. Is there something else I need to get so that I can use the files later on for further analysis.
Thanks in advance
V Karthick
Yes, the ".sav" extension is the data file. You should also request the syntax file(s), ".sps" extension. The syntax file is a record of all data transformations which have been performed and allows you to review their work. The syntax file can be opened with notepad or any text editor.
Arthur
I just want to know if there could be any way by which we can read a value from an .xls file using a .bat file.
For eg:If i have an .xls named test.xls which is having two columns
namely 'EID' and then 'mail ID'.Now when we give the input to the .xls the EID name.it should extract the mail id which corresponds to the EID and echo the result out.
**EID** **MailID**
E22222 MynameisA#company.com
E33333 MynameisB#company.com
...
...
So by the above table,when i give the input to the xls file using my .bat file as E22222,it should read the corresponding mail ID as MynameisA#company.com and it should echo the value.
So i hope i am able to present my doubt.Please get back to me for more clarifications.
Thanks and regards
Maddy
There is no facility to do this directly with traditional .bat files. However, you might investigate PowerShell, which is designed to be able to do this sort of thing. PowerShell integrates well with existing Windows applications (such as Excel) and may provide the tools you need to do this easily.
A quick search turned up this example of reading Excel files from PowerShell.
You can't do this directly from a batch file. Furthermore, to manipulate use Excel files in scripting you need Excel to be installed.
What you can do is wrap the Excel-specific stuff in a VBScript and call that from your batch.
You can do it with Alacon - command-line utility for Alasql database.
It works with Node.js, so you need to install Node.js and then Alasql package:
To take data from Excel file you can use the following command:
> node alacon "SELECT VALUE [mail ID] FROM XLS('mydata.xls', {headers:true})
WHERE EID = ?" "E2222"
Fist parameter is a SQL-expresion, which read data from XLSX file with header and search data
for second parameter value: "E22222". The command returns mail ID value.
This will be hard (very close to impossible) in BAT, especially when using the original XLS file, but even after an export to CSV it will be much easier to use a script/programming language (Perl, C, whatever) to do this.