Datomic ids in datascript - datomic

I'm using datomic on the server side, with multiple reagent atoms on the client, and now looking at trying datascript on the client.
Currently, I'm passing across a nested structure via an initial api load, which contains the result of a datomic pull query. It's pretty concise, and works fine.
However, now looking to explore the potential benefits of datascript. Selling point there is it seems to allow to retain normalisation right down to the attribute level. However, I come across an initial hurdle. Datascript isn't, as I'd imagined (perhaps, hoped...), a way to just subset your datomic db and replicate it on the client. Problem is, datomic's entity ids cannot be shared to datascript, specifically - when you transact! entities into datascript, a new eid (datascript's) is issued for each entity.
I haven't worked through all of the consequences yet but it appears it would be necessary to store :datomic-id in datascript, in addition to datascript's own newly issued :db/id, and ref types are going to use datascript's id, not datomics. This potentially complicates synchronisation back to datomic, feels like it could create a lot of potential gotchas, and isn't as isomorphic as I'd hoped. But still working on it. Can anyone share experience here? Maybe there's a solution...
Update:
wonder if a solution is to ban use of datomic's :db/id on the client, enforcing this by filtering them out of initial load; not passing them to client at all. Then any client -> server communication would have to use the (server-generated) slugs instead, which are passed in the initial load.
So, all entities would have different ids on the client, but we ban passage of server id to client, so client id if accidentally passed to server should probably say eid not found. There are likely more issues with this, haven't worked it right through yet.
You also have to think in entities, not datoms, when passing to & inserting to client, so as to create the correct refs there (or perhaps could insert a tree, if wrangle that).
So I've discovered that the datomic/datascript partnership certainly isn't just a case of 'serialise a piece of your database' - that might work if using datascript on the server, which is not the use case here at all (db persistence being required).

If I remember correctly, Datomic uses all 64 bits for entity ids, but in JavaScript (and by extension in DataScript) there’re only 53-bit integers max. So some sort of translation layer is necessary either way, no way around it.
P.S. you can totally specify :db/id to whatever you want in DataScript and it’ll use that instead of generating its own. Just make sure to fit in 53 bits.

Related

Domain Driven Design, should I use multiple databases or a single source of truth

I'm about to propose some fundamental changes to my employers and would like the opinion of the community (opinion because I know that getting a solid answer to something like this is a bit far-fetched).
Context:
The platform I'm working on was built by a tech consultancy before I joined. While I was being onboarded they explained that they used DDD to build it, they have 2 domains, the client side and the admin side, each has its own database, its own GraphQl server, and its own back-end and front-end frameworks. The data between the tables is being synchronized through an http service that's triggered by the GraphQl server on row insertions, updates, and deletes.
Problem:
All of the data present on the client domain is found in the admin domain, there's no domain specific data there. Synchronization is a mess and is buggy. The team isn't large enough to manage all the resources and keep track of the different schemas.
Proposal:
Remove the client database and GraphQl servers, have a single source of truth database for all the current and potentially future applications. Rethink the schema, split the tables that need to be split, consolidate the ones that should be joined, and create new tables according to the actual current business flow.
Am I justified in my proposal, or was the tech consultancy doing the right thing and I'm sending us backwards?
Normally you have a database, or schema, for each separated boundary context. That means, that the initial idea of the consultancy company was correct.
What's not correct is the way that the consistency between the two is managed. You don't do it on tables changes but with services inside one (or both) the domains listening to the events and taking the update actions. It's a lot of work, anyway, because you have to update the event handlers on every change (in the events or tables structure).
This code is what's called anti corruption layer, that's exactly what it does: it avoids any corruption between the copies of the domain in another domain.
Said this, as you pointed out, your team is small and it could be that maintaining such a layer (and hence code) could cost a lot of energies. But, you've also to remember that once you've done, you have just to update it when needed.
Anyway, back to the proposal, you could also take this route. What you should (must, I would say) is that in each domain the external tables should be accessed only by some services, or queries, and this code should never ever modify the content that it access. Never. But I suppose that you already know this.
Nothing is written in the stone, the rules should always be adapted when put in a real context. Two separated databases means more work, but also a much better separation of the domains. It could never happen that someone accidentally modifies the content of the tables of the other domain. On the other side, one database means less work, but also much more care about what the code does.

Message storage duplication for messaging systems

In many sub-system designs for messaging applications (twitter, facebook e.t.c) I notice duplication of where user message history is stored. On other hand they use tokenizing indexer like ElasticSeach or Solr. It's good for search. On other hand still use some sort of DB for history. Why to duplicate? Why the same instance of ES/Solr/EarlyBird can not be used for history? It's in fact able to.
The usual problem is the following - you want to search and also ideally you want to try index data in a different manner (e.g. wipe index and try new awesome analyzer, that you forgot to include initially). Separating data source and index from each other makes system less coupled. You're not afraid, that you will lose data in the Elasticsearch/Solr.
I am usually strongly against calling Elasticsearch/Solr a database. Since in fact, it's not. For example none of them have support for transactions, which makes your life harder, if you want to update multiple documents following standard relational logic.
Last, but not least - one of the hardest operation in Elasticsearch/Solr is to retrieve stored values, since it's not much optimised to do so, especially if you want to return 10k documents at once. In this case separate datasource would also help, since you will be able to return only matched document ids from Elasticsearch/Solr and later retrieve needed content from datasource and return it to the user.
Summary is just simple - Elasticsearch/Solr should be more think of as a search engines, not data storage.
True that ES is NOT a database per se and will never be. But no one says you cannot use it as such, and many people actually do. It really depends on your specific use case(s), and in the end it's all a question of the trade-offs you are ready to make to support your specific needs. As with pretty much any technology in general, there is no one-size-fits-all approach and with ES (and the like) it's no different.
A primary source of truth might not necessarily be a relational DBMS and they are not necessarily "duplicating" the data in the sense that you meant, it can be anything that has a copy of your data and allows you to rebuild your ES indexes in case something goes wrong. I've seen many many different "sources of truth". It could simply be:
your raw flat files containing your historical logs or business data
Kafka topics that you can replay anytime easily
a snapshot that you take from ES on a regular basis
a relational DB
you name it...
The point is that if something goes wrong for any reason (and that happens), you want to be able to recreate your ES indexes, be it from a real DB, from backups or from raw data. You should see that as a safety net. Even if all you have is a MySQL DB, you usually have a backup of it, so you're already "duplicating" the data in some way.
One thing that you need to think of, though, when architecting your system, is that you might not necessarily need to have the entirety of your data in ES, since ES is a search and analytics engine, you should only store in there what is necessary to support your search and analytics needs and be able to recreate that information anytime. In the end, ES is just a subsystem of your whole architecture, just like your DB, your messaging queue or your web server.
Also worth reading: Using ElasticSeach as primary source for part of my DB

Redis database snapshot diffs or other suggested DB for network/resource monitoring

I have a monitoring service that polls a REST API for information about the latest resources (list of hosts/list of licenses). The monitoring service cache's all this data in a Redis database. Everything works great for discovering new resources.
However the problem I am facing is when a host drops off the network. The challenge I am facing is that I haves no way of knowing that the host has disappeared from the list of hosts. The REST API only gives me a way of querying a list of hosts.
One way that I can come up (theoretically) is by taking a diff of the rdb at different time intervals. However this does not seem efficient to me and honestly I am not sure how I would do this with redis.
The suggestions I am looking for are, maybe some frameworks which are best suited for this kind of an operation or if need be a different database that might be as efficient as redis yet gives me the functionality I need to take diffs. Time series databases spring to mind but I have no experience in them and not sure how they can be used to solve this problem precisely.
There's no need to resort to anywhere besides Redis itself - it is robust enough to continue serving your requirements as long as you tell it what to do (like any other software ;)).
The following is an example but as you didn't specify how you're caching your data, I'll assume for simplicity's sake that you have a key per every host/license in your list where you store some string/binary value, like:
SET acme.org "some cached value"
You have a lot of such keys because the monitoring REST API returns a list, so a common way to keep everything order is use another key to store that list for each request returned by the API. You can achieve that with a Set:
SADD request:<timestamp> acme.org foo.bar ...
Sets are particularly useful here because you can perform Set operations, SDIFF and SINTER and store-variants in your case, to keep track of the current online and dropped hosts. For example:
MULTI
SINTERSTORE online:<timestamp> request:<timestamp> request:<previous-timestamp>
SDIFFSTORE dropped:<timestamp> request:<timestamp> request:<previous-timestamp>
EXEC
Note: as you're caching things it is good practice to expiry values (TTL) to all relevant keys and use an appropriate eviction policy.

Clojure database interaction - application design/approach

I hope this question isn't too general or doesn't make sense.
I'm currently developing a basic application that talks to an SQLite database, so naturally I'm using the clojure.java.jdbc library (link) to interact with the DB.
The trouble is, as far as I can tell, the way you insert data into the DB using this library is by simply passing a map (e.g. {:id 1 :name "stackoverflow"} and a table name (e.g. :website)
The thing that that I'm concerned about is how can I make this more robust in the wider context of my application? What I mean by this is when I write data to the database and retrieve it, I want to use the same formatted map EVERYWHERE in the application, so from the data access layer (returning or passing in maps) all the way up to the application layer where it works on the data and passes it back down again.
What I'm trying to get at is, is there an 'idiomatic' clojure equivalent of JavaBeans?
The problem I'm having right now is having to repeat myself by defining maps manually with column names etc - but if I change the structure of my table in the DB, my whole application has to be changed.
As far as I know, there really isn't such a library. There are various systems that make it easier to write queries, but not AFAIK, anything that "fixes" your data objects.
I've messed around trying to write something like you propose myself but I abandoned the project since it became very obvious very quickly that this is not at all the right thing to do in a clojure system (and actually, I tend to think now that the approach has only very limited use even in languages that have really "fixed" data structures).
Issues with the clojure collection system:
All the map access/alteration functions are really functional. That
means that alterations on a map always return a new object, so it's
nearly impossible to create a forcibly fixed map type that's also
easy to use in idiomatic clojure.
General conceptual issues:
Your assumption that you can "use the same formatted map EVERYWHERE
in the application, so from the data access layer (returning or
passing in maps) all the way up to the application layer where it
works on the data and passes it back down again" is wrong if your
system is even slightly complex. At best, you can use the map from
the DB up to the UI in some simple cases, but the other way around is
pretty much always the wrong approach.
Almost every query will have its own result row "type"; you're
probably not going to be able to re-use these "types" across queries
even in related code.
Also, forcing these types on the rest of the program is probably
binding your application more strictly to the DB schema. If your
business logic functions are sane and well written, they should only
access as much data as they need and no more; they should probably
not use the same data format everywhere.
My serious answer is; don't bother. Write your DB access functions for the kinds of queries you want to run, and let those functions check the values moving in and out of the DB as much detail as you find comforting. Do not try to forcefully keep the data coming from the DB "the same" in the rest of your application. Use assertions and pre/post conditions if you want to check your data format in the rest of the application.
Clojure favour the concept of Few data structure and lots of functions to work on these few data structures. There are few ways to create new data structure (which I guess internally uses the basic data structures) like defrecord etc. But again if you are able to use them that won't really solve the problem that DB schema changes should effect the code less as you will eventually have to go through layers to remove/add the effects of the schema changes, because anywhere you are reading/creating that data that needs to be changed

how to restrict or filter database access according to application user attributes

I've thought about this too much now with no obviously correct solution. It might be a real wood-for-the-trees situation, so I need stackoverflow's help.
I'm trying to enforce database filtering on a regional basis. My system has various users and each one is assigned to a regional office. I only want users to be able to see data that is associated with their regional office.
Put simply my application is: Java App -> JPA (hibernate) -> MySQL
The database contains object from all regions, but I only want the users to be able to manipulate objects from their own region. I've thought about the following ways of doing it:
1) modify all database querys so they read something like select * from tablex where region="myregion". This is nasty. It doesn't work to well with JPA eg the entitymanager.find() method only accepts primary key. Of course I can go native, but I only have to miss one select statement and my security is shot
2) use a mysql proxy to filter results. kind of funky, but then the mysql proxy just sees the raw call and doesn't really know how it should be filtering them (ie which region the user that made this request belongs to). Ok, I could start a proxy for each region, but it starts getting a little messy..
3) use separate schemas for each region. yeah, simple, I'm using spring so I could use the RoutingDataSource to route the requests via the correct datasource (1 datasource per schema). Of the course the problem now is somewhere down the line I'm going to want to filter by region and some other category. ohps.
4) ACL - not really sure about this. If a did a select * from tablex; would it quietly filter out objects I don't have access for or would a load of access exceptions be thrown?
But am I thinking too much about this? This seems like a really common problem. There must be some easy solution I'm just too dumb to see. I'm sure it'll be something close to / or in the database as you want to filter as near to source as possible, but what?
Not looking to be spoonfed - any links, keywords, ideas, commerical/opensource product suggestions would be really appreciated!! thanks.
I've just been implementing something similar (REALbasic talking to MySQL) over the last couple of weeks for a hierarchical multi-company extension to an accounting package.
There's a large body of existing code which composes SQL statements so we had to live with that and just do a lot of auditing to ensure the restrictions were included in each table as appropriate. One gotcha was related lookups where lookup tables were normally only used in combination with a primary table but for some maintenance GUIs would load the lookup table itself, directly.
There's a danger of giving away implied information such as revealing that Acme Pornstars are a client of some division of the company ;-)
The only solution for that part was very careful construction of DB diagrams to show all implied relationships and lots of auditing and grepping source code, with careful commenting to indicate areas which had been OK'd as not needing additional restrictions.
The one pattern I've come up with to make this more generalised in future is, rather than explicit region=currentRegionVar type searches, using an arbitrary entityID which is supplied by a global CurrentEntityForRole("blah") function.
This abstraction allows for sharing of some data as well as implementing pseudo-entities which represent other restriction boundaries.
I don't know enough about Java and Spring to be able to tell but is there a way you could use views to provide a single-key lookup, where the views are restricted by the region filter?
The desire to provide aggregations and possible data sharing was why we didn't go down the separate database route.
Good Question.
Seems like #1 is the best since it's the most flexible.
Region happens to be what you're filtering on today, but it could be region + department + colour of hair tomorrow.
If you start carving up the data too much it seems like you'll be stuck working harder than necessary to glue them all back together for reporting.
I am having the same problem. It is hard to believe that such a common task (filtering a list of model entities based on the user profile) has not a 'standard' way, pattern or best-practice to do it.
I've found pgacl, a PostgreSQL module. Basically, you do your query like you normally would, and then you tack on an acl_access() predicate to work as a filter.
Maybe there is something similar for MySQL.
I suggest you to use ACL. It is more flexible than other choices. Use Spring Security. You can use it without using Spring Framework. Read the tutorial from link text

Resources