Can I access the same data using different APIs?
Example: Can I insert the data using YCQL and read it using YEDIS?
Short answer is No. Even though the APIs share a common distributed document store, the data modeling and query constructs offered are significantly different. This means that the data inserted or managed by one API cannot be queried by the other API. More on this in the Note here: https://docs.yugabyte.com/latest/introduction/#what-client-apis-are-supported-by-yugabyte-db
Related
Warning! This question is a catch, I bring 0 XP considering RDF systems, so I couldn't express this in a single question. Feel free to skip the first two paragraphs.
What I'm trying to build, overall
I'm currently building a Spring app that will be the back-end for a system that will gather measurements.
I want to store the info in a triple-store instead of an RDBMS.
So, you may imagine a Spring Boot app with the addition of the Jena library.
The workflow of the system
About the methodology that I'm planning to deploy.
1. Once the App is up and running it would either create or connect to an existing triple-store database.
2. A POST request reaches an app controller.
3. I use SPARQL query to insert the new entry to the triple-store.
4. Other Controller/Service/DAO methods exist to serve GET requests for SELECT queries on the triple-store.
*The only reason I provided such a detailed view of my final goal is to avoid answers that would call my question a XY-problem.
The actual problem
1. Does a org.apache.jena.query.Dataset represent an in memory triple-store or is this kind of Dataset a completely different data structure?
2. If a Dataset is indeed a triple-store, then how can I store this in-memory Dataset to retrieve it in a later session?
3. If indeed one can store a Dataset, then what are the options? Is the default storing a Dataset as a file with .tdb extension? If so then what is the method for that and under which class?
4. If so far I am correct in my guess then would the assemble method be sufficient to "retrieve" the triple-store from the file stored?
5. Do all triple-store databases follow this concept, of being stored in .tdb files?
org.apache.jena.query.Dataset is an interface - there are multiple implementations with different characteristics.
DatasetFactory makes datasets of various kinds. DatasetFactory.createTxnMem is an in-memory, transactional dataset. It can be initialized with the contents of files but updates do not change the files.
An in-memory only exists for the JVM-session.
If you want data and data changes to persist across sessions, you can use TDB for persistent storage. Try TDBFactory or TDB2Factory
TDB (TDB1 or TDB2) are triplestore databases.
Fuseki is the triple store server. You can send SPARQL requests to Fuseki (query, update, bulk upload, ...)
You can start Fuseki with a TDB database (it creates if it does not exist)
fuseki-server -tdb2 --loc DB /myData
".tdb" isn't a file extension Apache Jena uses. Databases are a directory of files.
I have an offline ready application that I am currently building in electron.
The core requirements are that all data is restricted (have to be a user to read or write) and that within that data some data is further restricted to a user, (account information, messages, etc...)
Now I do not want to replicate any data offline that a user should not have access to (this is because all the data can be seen using the devtools regardless of restriction) so essentially I only want to sync data to PouchDB's offline store if that user has access to it as well as all the data all users have access to.
Now I have read the following posts/guides but I am still a little confused.
https://pouchdb.com/2015/04/05/filtered-replication.html
https://www.joshmorony.com/creating-a-multiple-user-app-with-pouchdb-couchdb/
Restricting Access to local PouchDB
From my understanding filtering is a bad choice performance wise even though it could do what I want.
Setting up a proxy would work but it then essentially becomes a REST api and the data synchronization falls apart.
And the final option which I think is what I want is to have a database for every user that would contain their private information and then additional databases to hold the information that is available to every user.
The only real question I have with this approach is how is data handled that is private but shared between two users (messages, etc...)
I am more after an overarching view of how the data should be stored as opposed to code examples, just really struggling with the conceptual architecture of the application.
There are many solutions to your problem. One solution looks very promising: IBM Cloudant has started work on Cloudant Envoy, a proxy simulating the CouchDB interface instead of a simple REST API. You can read more about it on the site for Envoy over at ibm.com. A custom replicator for PouchDB is also available on Github.
There's also a blog post on Medium.com on this.
The idea is the same as the much older Couchbase Sync Gateway. Although Couchbase has common roots with CouchDB, I have not tracked if they still support replication with CouchDB.
The easiest way to start would be to create a single database per user on the server, and a common database that you just pull the shared data from. Let me know if you need more info on this solution.
Can data be permanently stored in Persistent storage, such that anytime the applications loads I can get the last inserted item?
If yes, can I get a code sample or a link to some examples.
Of course! That's how modern databases work.
My advice is to consider storing your data as JSON in MongoDB for maximum portability:
MongoDB is an open-source document database that provides high
performance, high availability, and automatic scaling.
I highly recommend mlab, who offer MongoDB as a service - you can host up to 500MB on one or more databases. Follow the steps in their getting started documentation (5-10 mins). Then download Robomongo and configure it with the connection settings from mlab; you'll be able to view all of your databases, collections (aka tables) and documents (aka datums), and interact with them using either basic point and click, or programmatically using its Mongo Shell interface.
See the developer guide section on storage where the various types of storage options are discussed in depth. There are many samples in that section but you will need to clarify a more exact use case of what you are trying to accomplish.
If you just want to save one variable you can use something like:
Preferences.set("var", val);
String var = Preferences.get("var", defaultValue);
I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?
I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.
Can someone explain to me what the fundamental difference is between Core Data (apparently, a "data store") and a database like SQLite or MySQL?
I am working on writing an iPhone app, and needed a table of static data to display. I thought core data would be a good choice for this, so I got everything set up and functioning as far as the database (i'm sorry - data STORE) went, and then went to try to import my data (it was in an excel file which I exported to CSV). I was thinking it should be a straight forward process like I have done in SQLite and other databases many times, but as it turned out after much research, the only "official" way to do this was to write a parser specifically for my data.
When I asked about this on the Apple Developer forums, the response I got was basically "What kind of idiot are you to think that you could possibly import data directly without having to write code to do it? Core data isn't a database- it's a data STORE!!" For the life of me, though, I can't see the distinction. In every way I have looked at it, core data behaves EXACTLY like a database, with a fancy way of accessing it and enough abstraction that it can use a variety of file formats for actually storing the data. In fact, I was eventually able to import my data using a simple SQLite .import command, so I really don't understand why the concept was so foreign to the responders to my original question.
So what am I missing here? What is so fundamentally different about a data store from a database that makes the concept of simple data importing completely alien to those who know the technology?
Core Data is not simply a means of persisting/storing data to and from disk as is SQL. Core Data's true function is to provide the complete model layer for the Model-View-Controller app design that the Apple API uses. As such Core Data is primarily an object-graph manager with persistence options tack onto the side.
An object-graph is a collection of live objects in memory. In Core Data, these are the managed objects. They are called "managed" objects because the managed object context observes the objects constantly making sure they are in the states and relationships that the data model says they should be in.
Core Data does provide persistence option but exactly what that option is for any particular implementation is largely hidden. You can even use the same data model and managed objects with different persistence methods, sometime in the same app.
The key difference with SQL is that SQL writes the actual data to disk whereas Core Data serializes live objects. When you look at a sqlite store in Core Data you are looking at objects that have been taken apart and "freeze dried". Obviously, "freeze drying" objects requires a rather specific data format in the sqlite store so the Core Data store uses its own custom schema that is largely the same regardless of details of the store.
That is why you can't just swap in any old SQL file and expect Core Data to import it. The SQL file is rows, tables and columns of data and not a specialized tables, columns and rows use to reconstitute freeze dried objects.
Since Core Data is first and foremost an object-graph manager, the only supported and reliable means of importing data is to create the object-graph. In the case of an SQL file, that means reading the SQL data using the SQL api and then generating managed objects from that data and then saving them to a persistent store.
That part is more work but you save time integrating the data into the rest of the app, upgrading the data and gains in reliability and maintainability.
A dictionary definition gives me:
Databases are data stores, but a data store isn't always a database.
The feature you expected isn't available in some databases either (but most are).
A data store can for example store non-relational data.
They should have just pointed you at the Wikipedia article on Core Data.
According to that article, "It allows data organised by the relational entity-attribute model to be serialised into XML, binary, or SQLite stores. The data can be manipulated using higher level objects representing entities and their relationships. Core Data manages the serialised version, providing object lifecycle and object graph management, including persistence. Core Data interfaces directly with SQLite, insulating the developer from the underlying SQL."
I guess it's the fact that "Core Data manages the serialised version" that means you can't import data directly. That is, you probably can't import data directly into SQLite in such a way that Core Data can manage it, although you probably can import data directly into SQLite in some way.
Core Data is not a data store, a data store is one part of Core Data. Core Data is closer related to an Object Relational Mapping (ORM) tool. Core Data actually has the option of using SQLite for it's datastore, but you can also choose XML files, proprietary format, or write your own datastore.
Not sure how you were able to import your data with a SQL import, shouldn't be compatible with Core Data since Core Data creates a proprietary SQL database schema that contains a ton of metadata.
Maybe it's better to think of Core Data as an "object store" and a database as a "data store". Core Data is good when you have a variety of types of object, with relationships to each other. The familiar example is a company with employees, who have bosses and reports, belong to departments, are assigned to clients, projects, etc., have schedules, go to meetings. Employees can get reassigned, etc. Even the types of relationships defined vary from time to time. That's a more heavyweight process even with Core Data, but Core Data makes it more easy than with a raw database.
If you just have "data", and not "objects", it's easier to use a database. For example if you just have a table of the elements with atomic weights, etc., you might want to just use a database.
For your application it sounds like you just have one table. It will be easy to just use SQLite, which is available, so use it if it's more convenient.
On the other hand, iOS SDK has some pre-built features that interact with Core Data. If you use SQLite you don't get those. So you might avoid custom code to import your data but have to write custom code to display your data. Tough luck. When creating software sometimes you have to write code. Weird, I know.