Is it possible to store an in-memory Jena Dataset as a triple-store? - dataset

Warning! This question is a catch, I bring 0 XP considering RDF systems, so I couldn't express this in a single question. Feel free to skip the first two paragraphs.
What I'm trying to build, overall
I'm currently building a Spring app that will be the back-end for a system that will gather measurements.
I want to store the info in a triple-store instead of an RDBMS.
So, you may imagine a Spring Boot app with the addition of the Jena library.
The workflow of the system
About the methodology that I'm planning to deploy.
1. Once the App is up and running it would either create or connect to an existing triple-store database.
2. A POST request reaches an app controller.
3. I use SPARQL query to insert the new entry to the triple-store.
4. Other Controller/Service/DAO methods exist to serve GET requests for SELECT queries on the triple-store.
*The only reason I provided such a detailed view of my final goal is to avoid answers that would call my question a XY-problem.
The actual problem
1. Does a org.apache.jena.query.Dataset represent an in memory triple-store or is this kind of Dataset a completely different data structure?
2. If a Dataset is indeed a triple-store, then how can I store this in-memory Dataset to retrieve it in a later session?
3. If indeed one can store a Dataset, then what are the options? Is the default storing a Dataset as a file with .tdb extension? If so then what is the method for that and under which class?
4. If so far I am correct in my guess then would the assemble method be sufficient to "retrieve" the triple-store from the file stored?
5. Do all triple-store databases follow this concept, of being stored in .tdb files?

org.apache.jena.query.Dataset is an interface - there are multiple implementations with different characteristics.
DatasetFactory makes datasets of various kinds. DatasetFactory.createTxnMem is an in-memory, transactional dataset. It can be initialized with the contents of files but updates do not change the files.
An in-memory only exists for the JVM-session.
If you want data and data changes to persist across sessions, you can use TDB for persistent storage. Try TDBFactory or TDB2Factory
TDB (TDB1 or TDB2) are triplestore databases.
Fuseki is the triple store server. You can send SPARQL requests to Fuseki (query, update, bulk upload, ...)
You can start Fuseki with a TDB database (it creates if it does not exist)
fuseki-server -tdb2 --loc DB /myData
".tdb" isn't a file extension Apache Jena uses. Databases are a directory of files.

Related

NoSQL Database - Saving JSON Files on Local Server instead of a Database Server

What are the disadvantages of saving json files on the virtual machine you paid for instead of saving it in a database like MongoDB (security concerns, efficiency,...)?
I have tried using both ways of storing data but still couldn't find any difference in performance.
It's probably faster to store the JSON files simply as files in your filesystem of the virtual machine.
Unless you need to do one or more of the following:
Use a query language to extract parts of the JSON files.
Search the JSON files efficiently.
Allow clients on multiple virtual machines to access the same JSON data via an API.
Enforce access controls, so some clients can access some JSON data.
Make a consistent backup copy of the data.
Store more JSON data than can fit on a single virtual machine.
Enforce a schema so the JSON files are assured to have consistent structure.
Update JSON data in a way that is assured to succeed completely, or else make no change.
Run aggregate queries like COUNT(), MAX(), SUM() over a collection of JSON files.
You could do all these things by writing more code. It will take you years to develop that code to be bug-free and well-optimized.
By the end, what would you have developed?
You'd have developed a database management system.
Well, for small data you won't probably find the difference. Rather you may find that the data from you VM takes less time to return because you're not sending request to another remote server.
But when your data grows it will be hard to maintain. That's why we use a database management system to manage and process our data efficiently.
So, if you are storing small configuration file then you can use your filesystem for that otherwise I definitely recommend using a DBMS.

can we use JSON as a database?

I'm looking for fast and efficient data storage to build my PHP based web site. I'm aware of MySql. Can I use a JSON file in my server root directory instead of a MySQL database? If yes, what is the best way to do it?
You can use any single file, including a JSON file, like this:
Lock it somehow (google PHP file locking, it's possibly as simple as adding a parameter to file open function or changing function name to locking version).
Read the data from file and parse it to internal data stucture.
Optionally modify the data in internal data structure.
If you modified the data, truncate the file to 0 length and write new data to it.
Unlock the file as soon as you can, other requests may be waiting...
You can keep using the data in internal structures to render the page, just remember it may be out-dated as soon as you release the file lock and other HTTP request can modify it.
Also, if you modify the data from user's web form, remember that it may have been modified in between. Like, load page with user details for editing, then other user deletes that user, then editer tries to save the changed details, and should probably get error instead of re-creating deleted user.
Note: This is very inefficient. If you are building a site where you expect more than say 10 simultaneous users, you have to use a more sophisticated scheme, or just use existing database... Also, you can't have too much data, because parsing JSON and generating modified JSON takes time.
As long as you have just one user at a time, it'll just get slower and slower as amount of data grows, but as user count increases, and more users means both more requests and more data, things start to get exponentially slower and you very soon hit limit where HTTP requests start to expire before file is available for handling the request...
At that point, do not try to hack it to make it faster, but instead pick some existing database framework (SQL or nosql or file-based). If you start hacking together your own, you just end up re-inventing the wheel, usually poorly :-). Well, unless it is just programming exercise, but even then it might be better to instead learn use of some existing framework.
I wrote an Object Document Mapper to use with json files called JSON ODM may be a bit late, but if it is still needed it is open source under MIT Licence.
It provides a query languge, and some GeoJSON tools
The new version of IBM Informix 12.10 xC2 supports now JSON.
check the link : http://pic.dhe.ibm.com/infocenter/informix/v121/topic/com.ibm.json.doc/ids_json_007.htm
The manual says it is compatible with MongoDB drivers.
About the Informix JSON compatibility
Applications that use the JSON-oriented query language, created by
MongoDB, can interact with data stored in Informix® databases. The
Informix database server also provides built-in JSON and BSON (binary
JSON) data types.
You can use MongoDB community drivers to insert, update, and query
JSON documents in Informix.
Not sure, but I believe you can use the Innovator-C edition (free for production) to test and use it with no-cost either for production enviroment.
One obvious case when you can prefer JSON (or another file format) over database is when all yours (relatively small) data stored in the application cache.
When an application server (re)starts, an application reads data from file(s) and stores it in the data structure.
When data changes, an application updates file(s).
Advantage: no database.
Disadvantage: for a number of reasons can be used only for systems with relatively small data. For example, a very specific product site with several hundreds of products.

Storing data in text files instead of SQL Server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx
I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.
The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

is Using JSON data is better then Querying Database when there is no security issue for data

For my new project I'm looking forward to use JSON data as a text file rather then fetching data from database. My concept is to save a JSON file on the server whenever admin creates a new entry in the database.
As there is no issue of security, will this approach will make user access to data faster or shall I go with the usual database queries.
JSON is typically used as a way to format the data for the purpose of transporting it somewhere. Databases are typically used for storing data.
What you've described may be perfectly sensible, but you really need to say a little bit more about your project before the community can comment on your approach.
What's the pattern of access? Is it always read-only for the user, editable only by site administrator for example?
You shouldn't worry about performance early on. Worry more about ease of development, maintenance and reliability, you can always optimise afterwards.
You may want to look at http://www.mongodb.org/. MongoDB is a document-centric store that uses JSON as its storage format.
JSON in combination with Jquery is a great fast web page smooth updating option but ultimately it still will come down to the same database query.
Just make sure your query is efficient. Use a stored proc.
JSON is just the way the data is sent from the server (Web controller in MVC or code behind in standind c#) to the client (JQuery or JavaScript)
Ultimately the database will be queried the same way.
You should stick with the classic method (database), because you'll face many problems with concurrency and with having too many files to handle.
I think you should go with usual database query.
If you use JSON file you'll have to sync JSON files with the DB (That's mean an extra work is need) and face I/O problems (if your site super busy).

Core Data vs. Database fundamental difference?

Can someone explain to me what the fundamental difference is between Core Data (apparently, a "data store") and a database like SQLite or MySQL?
I am working on writing an iPhone app, and needed a table of static data to display. I thought core data would be a good choice for this, so I got everything set up and functioning as far as the database (i'm sorry - data STORE) went, and then went to try to import my data (it was in an excel file which I exported to CSV). I was thinking it should be a straight forward process like I have done in SQLite and other databases many times, but as it turned out after much research, the only "official" way to do this was to write a parser specifically for my data.
When I asked about this on the Apple Developer forums, the response I got was basically "What kind of idiot are you to think that you could possibly import data directly without having to write code to do it? Core data isn't a database- it's a data STORE!!" For the life of me, though, I can't see the distinction. In every way I have looked at it, core data behaves EXACTLY like a database, with a fancy way of accessing it and enough abstraction that it can use a variety of file formats for actually storing the data. In fact, I was eventually able to import my data using a simple SQLite .import command, so I really don't understand why the concept was so foreign to the responders to my original question.
So what am I missing here? What is so fundamentally different about a data store from a database that makes the concept of simple data importing completely alien to those who know the technology?
Core Data is not simply a means of persisting/storing data to and from disk as is SQL. Core Data's true function is to provide the complete model layer for the Model-View-Controller app design that the Apple API uses. As such Core Data is primarily an object-graph manager with persistence options tack onto the side.
An object-graph is a collection of live objects in memory. In Core Data, these are the managed objects. They are called "managed" objects because the managed object context observes the objects constantly making sure they are in the states and relationships that the data model says they should be in.
Core Data does provide persistence option but exactly what that option is for any particular implementation is largely hidden. You can even use the same data model and managed objects with different persistence methods, sometime in the same app.
The key difference with SQL is that SQL writes the actual data to disk whereas Core Data serializes live objects. When you look at a sqlite store in Core Data you are looking at objects that have been taken apart and "freeze dried". Obviously, "freeze drying" objects requires a rather specific data format in the sqlite store so the Core Data store uses its own custom schema that is largely the same regardless of details of the store.
That is why you can't just swap in any old SQL file and expect Core Data to import it. The SQL file is rows, tables and columns of data and not a specialized tables, columns and rows use to reconstitute freeze dried objects.
Since Core Data is first and foremost an object-graph manager, the only supported and reliable means of importing data is to create the object-graph. In the case of an SQL file, that means reading the SQL data using the SQL api and then generating managed objects from that data and then saving them to a persistent store.
That part is more work but you save time integrating the data into the rest of the app, upgrading the data and gains in reliability and maintainability.
A dictionary definition gives me:
Databases are data stores, but a data store isn't always a database.
The feature you expected isn't available in some databases either (but most are).
A data store can for example store non-relational data.
They should have just pointed you at the Wikipedia article on Core Data.
According to that article, "It allows data organised by the relational entity-attribute model to be serialised into XML, binary, or SQLite stores. The data can be manipulated using higher level objects representing entities and their relationships. Core Data manages the serialised version, providing object lifecycle and object graph management, including persistence. Core Data interfaces directly with SQLite, insulating the developer from the underlying SQL."
I guess it's the fact that "Core Data manages the serialised version" that means you can't import data directly. That is, you probably can't import data directly into SQLite in such a way that Core Data can manage it, although you probably can import data directly into SQLite in some way.
Core Data is not a data store, a data store is one part of Core Data. Core Data is closer related to an Object Relational Mapping (ORM) tool. Core Data actually has the option of using SQLite for it's datastore, but you can also choose XML files, proprietary format, or write your own datastore.
Not sure how you were able to import your data with a SQL import, shouldn't be compatible with Core Data since Core Data creates a proprietary SQL database schema that contains a ton of metadata.
Maybe it's better to think of Core Data as an "object store" and a database as a "data store". Core Data is good when you have a variety of types of object, with relationships to each other. The familiar example is a company with employees, who have bosses and reports, belong to departments, are assigned to clients, projects, etc., have schedules, go to meetings. Employees can get reassigned, etc. Even the types of relationships defined vary from time to time. That's a more heavyweight process even with Core Data, but Core Data makes it more easy than with a raw database.
If you just have "data", and not "objects", it's easier to use a database. For example if you just have a table of the elements with atomic weights, etc., you might want to just use a database.
For your application it sounds like you just have one table. It will be easy to just use SQLite, which is available, so use it if it's more convenient.
On the other hand, iOS SDK has some pre-built features that interact with Core Data. If you use SQLite you don't get those. So you might avoid custom code to import your data but have to write custom code to display your data. Tough luck. When creating software sometimes you have to write code. Weird, I know.

Resources