Clojure database interaction - application design/approach

Clojure database interaction - application design/approach - database

I hope this question isn't too general or doesn't make sense.
I'm currently developing a basic application that talks to an SQLite database, so naturally I'm using the clojure.java.jdbc library (link) to interact with the DB.
The trouble is, as far as I can tell, the way you insert data into the DB using this library is by simply passing a map (e.g. {:id 1 :name "stackoverflow"} and a table name (e.g. :website)
The thing that that I'm concerned about is how can I make this more robust in the wider context of my application? What I mean by this is when I write data to the database and retrieve it, I want to use the same formatted map EVERYWHERE in the application, so from the data access layer (returning or passing in maps) all the way up to the application layer where it works on the data and passes it back down again.
What I'm trying to get at is, is there an 'idiomatic' clojure equivalent of JavaBeans?
The problem I'm having right now is having to repeat myself by defining maps manually with column names etc - but if I change the structure of my table in the DB, my whole application has to be changed.

As far as I know, there really isn't such a library. There are various systems that make it easier to write queries, but not AFAIK, anything that "fixes" your data objects.
I've messed around trying to write something like you propose myself but I abandoned the project since it became very obvious very quickly that this is not at all the right thing to do in a clojure system (and actually, I tend to think now that the approach has only very limited use even in languages that have really "fixed" data structures).
Issues with the clojure collection system:
All the map access/alteration functions are really functional. That
means that alterations on a map always return a new object, so it's
nearly impossible to create a forcibly fixed map type that's also
easy to use in idiomatic clojure.
General conceptual issues:
Your assumption that you can "use the same formatted map EVERYWHERE
in the application, so from the data access layer (returning or
passing in maps) all the way up to the application layer where it
works on the data and passes it back down again" is wrong if your
system is even slightly complex. At best, you can use the map from
the DB up to the UI in some simple cases, but the other way around is
pretty much always the wrong approach.
Almost every query will have its own result row "type"; you're
probably not going to be able to re-use these "types" across queries
even in related code.
Also, forcing these types on the rest of the program is probably
binding your application more strictly to the DB schema. If your
business logic functions are sane and well written, they should only
access as much data as they need and no more; they should probably
not use the same data format everywhere.
My serious answer is; don't bother. Write your DB access functions for the kinds of queries you want to run, and let those functions check the values moving in and out of the DB as much detail as you find comforting. Do not try to forcefully keep the data coming from the DB "the same" in the rest of your application. Use assertions and pre/post conditions if you want to check your data format in the rest of the application.

Clojure favour the concept of Few data structure and lots of functions to work on these few data structures. There are few ways to create new data structure (which I guess internally uses the basic data structures) like defrecord etc. But again if you are able to use them that won't really solve the problem that DB schema changes should effect the code less as you will eventually have to go through layers to remove/add the effects of the schema changes, because anywhere you are reading/creating that data that needs to be changed

Related

Is there a reason to encapsulate if I am the only one using my code?

I understand that we encapsulate data to prevent things from being accessed that don't need to be accessed by developers working with my code. However I only program as a hobby and do not release any of code to be used by other people. I still encapsulate, but it mostly just seems to me like I'm just doing it for the sake of good policy and building the habit. So, is there any reason to encapsulate data when I know I am the only one who will be using my code?

Encapsulation not only about hiding data.
It is also about hiding details of implementation.
When such details are hidden, that forces you to use defined class API and the class is only who can change it inside.
So just imagine a situation, when you opened all methods to any class interested in them and you have a function that performs some calculations. And you've just realized that you want to replace it because the logic is not right, or you want to perform some complicated calculations.
In such cases sometimes you have to change all the places across your application to change the result instead of changing it in only one place, in API, that you provided.
So don't make everything public, it leads to strong coupling and pain during update process.

Encapsulation is not only creating "getters" and "setters", but also exposing a sort of API to access the data (if needed).
Encapsulation lets you keep access to the data in one place and allow you to manage it in a more "abstract" way, reducing errors and making your code more maintainable.
If your personal projects are simple and small, you can do whatever you feel like in order to produce fast what you need, but bear in mind the consequences ;)

I don't think unnecessary data access can happen only by third party developers. It can happen by you as well right? When you allow direct access to data through access rights on variables/properties, whoever is working with that, be it you, or someone else may end up creating bugs by accessing data directly.

Why do so many React/Flux tutorials advocate multiple stores?

I've been looking at the Baobab library and am very attracted to the "single-tree" approach, which I interpret as essentially a single store. But so many Flux tutorials seem to advocate many stores, even a "store per entity." Having multiple stores seems to me to present all kinds of concurrency issues. My question is, why is single store a bad idea?

It depends on what want to do and how big is your project. There are a few reason why having several stores is a good idea:
If your project is not so small afterall you may end up with a huge 2000/3000lines store and you don't want that. That's the point of writing modules in general. You want to avoid files bigger than 1000lines (and below 500 is even nicer :) ).
Writing everything in one store makes that you can't enjoy the dependency management with the dispatcher using the waitFor function.It's gonna be harder to check dependencies and potential circular dependencies between your models (since they are all in one store). I would suggest you take a look at https://facebook.github.io/flux/docs/chat.html for that.
It's harder to read. With several store you can at one glance figure out what type of data you have and with a constant file for the dispatcher events you can see all your events.
So it's possible to keep everything in one store and it may work perfectly but if your project grows you may regret it badly and rewrite everything in several modules/store. Just my opinion I prefer to have clean modules and data workflows.
Hope it helps!

From my experience, working with a single store is definitely not a bad idea. It has some advantages, such as:
A single store to access all data can make it easier to query and make relationships about different pieces of data. Using multiple stores can make this a little bit more difficult (definitely not impossible though).
It will be easier to make atomic updates to the application state (aka data store).
But the way you implement the Flux pattern will influence on your experience with a single data store. The folks at Facebook have been experimenting with this and it seems like they encourage the use of a single data store with their new Relay+GraphQL stuff (read more about it here: http://facebook.github.io/react/blog/2015/02/20/introducing-relay-and-graphql.html).

Best "current" MS data access technology for loose coupled constantly changing code?

I do a lot of untyped dataset work in my projects and have done so for a while but when working with in-place editing of datagridview I found it's a lot easier for validation and stuff if you use a Typed Dataset.
This poses an issue though because I don't like using those dataset designers to create strongly typed datatables/datasets. It makes it harder to make simple changes down the road when the dataset is typed than it does with untyped. Typed dataset changes require a copy of VS to be installed whereas untyped doesn't. I can change a sql view on the db server and the apps will show the new column in my grid. They may not be able to use the new column but most of my stuff is info display so that's ok.
I looked at Entity Framework, but it too looks like a few wizards must run to build your data model. I'm not against a data model but it would be great if it would generate at runtime so that new changes to the db don't require software recompiling.
Is there a happy medium? Or am I stuck creating untyped datatables at startup for a while longer?

It's all a matter of taste, of course, but I find that Datasets are the root of all evil.
well, maybe not all evil, but they represent a data structure that has no behavior associated with it => they are not objects (as defined in OOP) => using them promotes non-OOP style programming (not to mention procedural programming).
some other points:
gridviews and any other control fully support binding to a list of objects (and not just datasets).
I think that, if you have to make changes to your data model, having a copy of VS installed is not too much to ask.
along the same lines- it's also not an exaggerated requirement to have to recompile your code when you make changes to your data model.
if, when you change a table in your db, that forces a change on your UI, I would say it's not "loosely coupled" by any stretch of the imagination.
I believe that the only justification for using datasets is to pull data out of your db and then transfer it into your objects.
but now, as you know, if that isn't necessary- you have ORMs that do that job for you (EF is one, nHibernate is another, better, option).
so, in conclusion- I strongly recommend you reconsider your use of DataSets, as they go against the very basics of Object-Oriented.
p.s.
sorry if this came across as a little emotional- I was talking from bitter personal experience.
I was pulling my hair out for 2 years because the app I was working on had used DataSets all over, and that meant that we had to duplicate the behavior for that data all over as well. uughh....

Should I generate a complex object in the database or data access layer?

I'm working on an application for one of our departments that contains medical data. It interfaces with a third party system that we have here.
The object itself (a claim) isn't terribly complex, but due to the nature of the data and the organization of the database, retrieving the claim data is very complex. I cannot simply join all the tables together and get the data. I need to do a "base" query to get the basics of the claim, and then piece together supplemental data about the claim based on various issues.
Would it be better to when working with this data:
Generate the object in a stored procedure, where all of the relevant data is readily available, and iterate through a table variable (using SQL Server 2005) to piece together all the supplemental information.
Generate the object in the data access layer, where I have a little more powerful data manipulation at my disposal, and make a bunch of quick and simple calls to retrieve the lookup data.
Use an OR/M tool and map out all the complex situations to generate the object.
Something else.
EDIT: Just to clarify some of the issues listed below. The complexity really isn't a business issue. If a claim as a type code of "UB", then I have to pull some of the supplemental data from Table X. If the claim has a type code of "HCFA", then I have to pull some of the data from Table Y. It is those types of things. I hope this helps.

One more vote for stored procedures in this case.
What you are trying to model is a very specific piece of business logic ("what is a claim") that needs to be consistent across any application that deals with the concept of a claim.
If you only ever have one application, or multiple applications using the same middleware, you can put that in the client code; however, practice shows that databases tend to outlive software that accesses them.
You do not want to wind up in a situation where subtle bugs and corner cases in redundant implementations make different applications see the data in slightly different ways. DRY, and all that.

I would use a stored procedure for security reasons. You don't have to give SELECT privileges to the claims tables that you are using, which sound somewhat important. You only have to give the user access to that stored procedure. If the database user already has SELECT privileges on the tables, I don't see anything wrong with generating the object in the data access layer either. Just be consistent with whatever option you choose. If you are using stored procedures elsewhere, continue to use them here. The same applies to generating the objects in the data access layer.

Push decisions/business logic as high up in your applications code hierarchy as possible. ORMs/stored procedures are fine but cannot be as efficient as hand written queries. The higher up in your code you go the more you know what the data will be used for and have the information to intelligently get it.

I'm not a fan of pushing business logic down to the persistence layer, so I wouldn't recommend option 1. The approach I'd take involves having a well-defined program object that models the underlying database entity, so ORM oriented, but your option 3 sounds like you're thinking of it as an onerous mapping task, which I really don't. I'd just have the logic necessary for loading up whatever you're concerned about with this object set up in methods on the program object modeling it.

As a general rule, I use a data access layer just to retrieve data (possibly from different sources) and return it in a meaningful manner.
Anything that requires business rules or logic (decisions) goes in my business layer.
I do not deviate from that choice lightly*.
It sounds like the claim you are generating is really a view of data stored in various places, without any decisions or business logic. If that's the case, I would tend to manage access to that data in the data layer.
**I'll never forget one huge system I worked on that got very over-complicated because the only person available to work on a central piece was an expert at stored procedures... so lots of the business logic ended up there.*

Think of the different ways you're planning to consume the data. The whole purpose of an application layer is to make your life easier. If it doesn't, I agree with #hoffmandirt that it's safer in the database.

Stored procedures are bad, m'kay?
It sounds like views would be better than stored procedures in this case.
If you are using .NET, I would highly recommend going with an ORM to get support for Linq.
In general, spreading business logic between the database and application code is not a good idea.
In the end, any solution will likely work. You aren't facing a make or break type decision. Just get moving, don't get hung up on this kind of issue.

Database recommendation

I'm writing a CAD (Computer-Aided Design) application. I'll need to ship a library of 3d objects with this product. These are simple objects made up of nothing more than 3d coordinates and there are going to be no more than about 300 of them.
I'm considering using a relational database for this purpose. But given my simple needs, I don't want any thing complicated. Till now, I'm leaning towards SQLite. It's small, runs within the client process and is claimed to be fast. Besides I'm a poor guy and it's free.
But before I commit myself to SQLite, I just wish to ask your opinion whether it is a good choice given my requirements. Also is there any equivalent alternative that I should try as well before making a decision?
Edit:
I failed to mention earlier that the above-said CAD objects that I'll ship are not going to be immutable. I expect the user to edit them (change dimensions, colors etc.) and save back to the library. I also expect users to add their own newly-created objects. Kindly consider this in your answers.
(Thanks for the answers so far.)

The real thing to consider is what your program does with the data. Relational databases are designed to handle complex relationships between sets of data. However, they're not designed to perform complex calculations.
Also, the amount of data and relative simplicity of it suggests to me that you could simply use a flat file to store the coordinates and read them into memory when needed. This way you can design your data structures to more closely reflect how you're going to be using this data, rather than how you're going to store it.
Many languages provide a mechanism to write data structures to a file and read them back in again called serialization. Python's pickle is one such library, and I'm sure you can find one for whatever language you use. Basically, just design your classes or data structures as dictated by how they're used by your program and use one of these serialization libraries to populate the instances of that class or data structure.
edit: The requirement that the structures be mutable doesn't really affect much with regard to my answer - I still think that serialization and deserialization is the best solution to this problem. The fact that users need to be able to modify and save the structures necessitates a bit of planning to ensure that the files are updated completely and correctly, but ultimately I think you'll end up spending less time and effort with this approach than trying to marshall SQLite or another embedded database into doing this job for you.
The only case in which a database would be better is if you have a system where multiple users are interacting with and updating a central data repository, and for a case like that you'd be looking at a database server like MySQL, PostgreSQL, or SQL Server for both speed and concurrency.
You also commented that you're going to be using C# as your language. .NET has support for serialization built in so you should be good to go.

I suggest you to consider using H2, it's really lightweight and fast.

When you say you'll have a library of 300 3D objects, I'll assume you mean objects for your code, not models that users will create.
I've read that object databases are well suited to help with CAD problems, because they're perfect for chasing down long reference chains that are characteristic of complex models. Perhaps something like db4o would be useful in your context.

How many objects are you shipping? Can you define each of these Objects and their coordinates in an xml file? So basically use a distinct xml file for each object? You can place these xml files in a directory. This can be a simple structure.

I would not use a SQL database. You can easy describe every 3D object with an XML file. Pack this files in a directory and pack (zip) all. If you need easy access to the meta data of the objects, you can generate an index file (only with name or description) so not all objects must be parsed and loaded to memory (nice if you have something like a library manager)
There are quick and easy SAX parsers available and you can easy write a XML writer (or found some free code you can use for this).
Many similar applications using XML today. Its easy to parse/write, human readable and needs not much space if zipped.
I have used Sqlite, its easy to use and easy to integrate with own objects. But I would prefer a SQL database like Sqlite more for applications where you need some good searching tools for a huge amount of data records.

For the specific requirement i.e. to provide a library of objects shipped with the application a database system is probably not the right answer.
First thing that springs to mind is that you probably want the file to be updatable i.e. you need to be able to drop and updated file into the application without changing the rest of the application.
Second thing is that the data you're shipping is immutable - for this purpose therefore you don't need the capabilities of a relational db, just to be able to access a particular model with adequate efficiency.
For simplicity (sort of) an XML file would do nicely as you've got good structure. Using that as a basis you can then choose to compress it, encrypt it, embed it as a resource in an assembly (if one were playing in .NET) etc, etc.
Obviously if SQLite stores its data in a single file per database and if you have other reasons to need the capabilities of a db in you storage system then yes, but I'd want to think about the utility of the db to the app as a whole first.

SQL Server CE is free, has a small footprint (no service running), and is SQL Server compatible

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight