I have a database with quite a lot of Entities and I want to preload data from a file, on first creation of the database. For that the scheme of Room needs to fit the scheme of the database file. Since converting the json scheme by hand to SQLite statements is very error-prone ( I would need to copy paste every single of the statements and exchange the variable names) I am looking for a possibility to automatically generate a database from the scheme, that I then just need to fill with the data.
However apparently there´s no information if that is possible or even how to do so, out in the internet. It´s my first time working with SQLite (normally I use MySQL) and also the first time I see a database scheme in json. (Since standard MariaDB export options always just export the CREATE TABLE statements.)
Is there a way? Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays?
I followed the guide on Android Developer Guidelines to get the json-scheme, so I have that file already. For those, who do not know it´s structure, it looks like this:
{
"formatVersion": 1,
"database": {
"version": 1,
"identityHash": "someAwesomeHash",
"entities": [
{
"tableName": "Articles",
"createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`id` INTEGER NOT NULL, `germanArticle` TEXT NOT NULL, `frenchArticle` TEXT, PRIMARY KEY(`id`))",
"fields": [
{
"fieldPath": "id",
"columnName": "id",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "germanArticle",
"columnName": "germanArticle",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "frenchArticle",
"columnName": "frenchArticle",
"affinity": "TEXT",
"notNull": false
}
],
"primaryKey": {
"columnNames": [
"id"
],
"autoGenerate": false
},
"indices": [
{
"name": "index_Articles_germanArticle",
"unique": true,
"columnNames": [
"germanArticle"
],
"createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_germanArticle` ON `${TABLE_NAME}` (`germanArticle`)"
},
{
"name": "index_Articles_frenchArticle",
"unique": true,
"columnNames": [
"frenchArticle"
],
"createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_frenchArticle` ON `${TABLE_NAME}` (`frenchArticle`)"
}
],
"foreignKeys": []
},
...
Note: My question was not, how to create the Room DB out of the scheme. To receive the scheme, I already had to create all the Entities and the database. But how to get the structure Room creates as SQL to prepopulate my Database. However, I think the answer is a really nice explanation, and in fact I found the SQL-Statements I was searching for in the generated Java-file, which was an awesome hint. ;)
Is there a way? Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays?
You cannot simply provide the CREATE SQL for Room, what you need to do is to generate the java/Kotlin classes (Entities) from the JSON and then add those classes to the project.
native SQLite (i.e. not using Room) would be a different matter as it could do done at runtime.
The way Room works is that the database is generated from the classes annotated with #Entity (at compile time).
The Entity/classes have to exist for the compile to correctly generate the code that it generates.
Furthermore the Entity(ies) have to be incorporated/included into a class for the Database, that being annotated with #Database (this class is typically abstract).
Yet furthermore to access the database tables you have abstract classes or interfaces for the SQL each being annotated with #Dao and again these require the Entity classes as the SQL is checked at compile time.
e.g. the JSON you provided would equate to something like :-
#Entity(
indices = {
#Index(value = "germanArticle", name = "index_Articles_germanArticle", unique = true),
#Index(value = "frenchArticle", name = "index_Articles_frenchArticle", unique = true)
}
, primaryKeys = {"id"}
)
public class Articles {
//#PrimaryKey // Could use this as an alternative
long id;
#NonNull
String germanArticle;
String frenchArticle;
}
so your process would have to convert the JSON to create the above and which could then be copied into the project.
You would then need a Class for the database which could be for example :-
#Database(entities = {Articles.class},version = 1)
abstract class MyDatabase extends RoomDatabase {
}
Note that Dao classes would be added to body of the above along the lines of :-
abstract MyDaoClass getDao();
Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays?
Yes it does ....
At this stage if you compile it generates java (MyDatabase_Impl for the above i.e. the name of the Database class suffixed with _Impl). However as there are no Dao classes/interfaces. The database would unusable from a Room perspective (and thus wouldn't even get created).
part of the code generated would be :-
#Override
public void createAllTables(SupportSQLiteDatabase _db) {
_db.execSQL("CREATE TABLE IF NOT EXISTS `Articles` (`id` INTEGER NOT NULL, `germanArticle` TEXT NOT NULL, `frenchArticle` TEXT, PRIMARY KEY(`id`))");
_db.execSQL("CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_germanArticle` ON `Articles` (`germanArticle`)");
_db.execSQL("CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_frenchArticle` ON `Articles` (`frenchArticle`)");
_db.execSQL("CREATE TABLE IF NOT EXISTS room_master_table (id INTEGER PRIMARY KEY,identity_hash TEXT)");
_db.execSQL("INSERT OR REPLACE INTO room_master_table (id,identity_hash) VALUES(42, 'f7294cddfc3c1bc56a99e772f0c5b9bb')");
}
As you can see the Articles table and the two indices are created, the room_master_table is used for validation checking.
Related
I'm wondering in terms of database design what is the best approach between storing reference id, or embedded document even if it's means that multiple document can appears more than once.
Let's say I have that kind of model for the moment :
Collection User :
{
name: String,
types : List<Type>
sharedTypes: List<Type>
}
If I use the embedded model and don't use another collection it may result in duplicate object Type. For example, user A create Type aa and user B create Type bb. When they share each other they type it will result in :
{
name: UserA,
types : [{name: aa}]
sharedTypes: [{name:bb}]
},
{
name: UserB,
types : [{name: bb}]
sharedTypes: [{name:aa}]
}
Which results in duplication, so I guess it's pretty bad design. Should I use another approach like creating collection Type and store referenceId ?
Collection Type :
{
id: String
name: String
}
Which will still result in duplication but not one whole document, I guess it's better.
{
name: UserA,
types : ["randomString1"]
sharedTypes: ["randomString2"]
},
{
name: UserA,
types : ["randomString2"]
sharedTypes: ["randomString1"]
}
And the last one approach and maybe the best is to store from the collection types like this.
Collection User :
{
id: String
name: String
}
Collection Type :
{
id: String
name: String,
createdBy: String (id of user),
sharedWith: List<String> (ids of user)
}
What is the best approach between this 3.
I'm doing query like, I got one group of user, so for each user, I want the type created and the type people shared with me.
Broadly, the decision to embed vs. use a reference ID comes down to this:
Do you need to easily preserve the referential integrity of the joined data at point in time, meaning you want to ensure that the state of the joined data is "permanently associated" with the parent data? Then embedding is a good idea. This is also a good practice in the "insert only" design paradigm. Very often other requirements like immutability, hashing/checksum, security, and archiving make the embedded approach easier to manage in the long run because version / createDate management is vastly simplified.
Do you need the fastest, most quick-hit scalability? Then embed and ensure indexes are appropriately constructed. An indexed lookup followed by the extraction of a rich shape with arbitrarily complex embedded data is a very high performance operation.
(Opposite) Do you want to ensure that updates to joined data are quickly and immediately reflected in a join with parents? Then use a reference ID and the $lookup function to bring the data together.
Does the joined data grow essentially without bound, like transactions against an account? This is likely better handled through a reference ID to a separate transaction collection and joined with $lookup.
I got this schema in DynamoDB
{
"timestamp" : "",
"fruit" : {
"name" : "orange",
"translations" : [
{
"en-GB" : "orange"
},
{
"sv-SE" : "apelsin"
},
....
]
}
I need to store translations for objects in a DynamoDB database, to be able to query them efficiently. E.g. my query has to be something like "give me all objects where translations array contains "
The problem is, is this a really dumb idea? There are 6500 languages out there, and this means I will be forcing all entries to each contain an array with thousands of properties with 99% of them empty string values. What's a better approach?
Thanks,
Unless your willing to let DynamoDB do a table scan to get your results, I think your using the wrong tool. Consider streaming your transactions to AWS ElasticSearch via something like Firehose. Firehose will give you a lot of nice to haves and can help you rotate transaction indexes. ElasticSearch should able to store that structure and run your query.
If you don't go that route, then at least consider dropping the language code in your structure if your not actually using it. Just make an array of the unique spellings of your fruit. This is the kind of query I might try to do with multiple queries instead of a single one; Go from the spelling of the fruit name to a fruit UUID which you can then query against.
I would rather save it as.
{
"primaryKey" : "orange",
"SecondaryKey": "en-GB"
"timestamp" : "",
"Metadata" : {
"name" : "orange",
}
And create a secondary-index with SecondaryKey as PK and primaryKey as SK.
By Doing this you can query
Get me orange in en-GB.
What all keys existing in en-GB
If you are updating multiple item at once. You can create 1 object like this
{
"KeyName" : "orange",
"SecondaryKey": "master"
"timestamp" : "",
"fruit" : {
"name" : "orange",
"translations" : [
{
"en-GB" : "orange"
},
{
"sv-SE" : "apelsin"
},
....
]
}
And create a lambda function who denormalises the above object and creates multiple entities in dynamodb. But you will have to take create of deleting the elements as well. If in the new object some language is not there.
In a legacy project we had issues where if a developer would forget a project_id in the query condition, rows for all projects would be shown - instead of the single project they are meant to see. For example for "Comments":
comments [id, project_id, message ]
If you forget to filter by project_id you would see all projects. This is caught by tests, sometimes not, but I would rather do a prevention - the dev should see straightaway "WRONG/Empty"!
To get around this, the product manager is insisting on separate tables for comments, like this:
project1_comments [id,message]
project2_comments [id,message]
Here if you forgot the project/table name, if something were to still pass tests and got deployed, you would get nothing or an error.
However the difficulty is then with associated tables. Example "Files" linked to "Comments":
files [ id, comment_id, path ]
3, 1, files/foo/bar
project1_comments
id | message
1 | Hello World
project2_comments
id | message
1 | Bye World
This then turns into a database per project, which seems overkill.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
Many thanks in advance.
In a legacy project we had issues where if a developer would forget a project_id in the query condition
CakePHP generates the join conditions based upon associations you define for the tables. They are automatic when you use contains and it's unlikely a developer would make such a mistake with CakePHP.
To get around this, the product manager is insisting on separate tables for comments, like this:
Don't do it. Seems like a really bad idea to me.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
The easiest solution is to just forbid all direct queries on the Comments table.
class Comments extends Table {
public function find($type = 'all', $options = [])
{
throw new \Cake\Network\Exception\ForbiddenException('Comments can not be used directly');
}
}
Afterwards only Comments read via an association will be allowed (associations always have valid join conditions), but think twice before doing this as I don't see any benefits in such a restriction.
You can't easily restrict direct queries on Comments to only those that contain a product_id in the where clause. The problem is that where clauses are an expression tree, and you'd have to traverse the tree and check all different kinds of expressions. It's a pain.
What I would do is restrict Comments so that product_id has to be passed as an option to the finder.
$records = $Comments->find('all', ['product_id'=>$product_id])->all();
What the above does is pass $product_id as an option to the default findAll method of the table. We can than override that methods and force product_id as a required option for all direct comment queries.
public function findAll(Query $query, array $options)
{
$product_id = Hash::get($options, 'product_id');
if (!$product_id) {
throw new ForbiddenException('product_id is required');
}
return $query->where(['product_id' => $product_id]);
}
I don't see an easy way to do the above via a behavior, because the where clause contains only expressions by the time the behavior is executed.
I have a non-standard question to CakePHP 3.3. Let's imagine that in my database I have two tables: A and B (both are identical, first is dedicated for data in the first language, second is dedicated for data in the second language).
I correctly coded the whole website for table A (table B is not yet in use). Additionally, I implemented the .po files mechanizm to switch the language of the interface. The language of the inteface switches correctly.
How can I easily plug the table B - I do not want to make IF-ELSE statements in all cases because the website is getting big, and there are many operations in table A already included. Is there a possibility to somehow make a simple mapping that table A equals table B if language pl_PL is selected to en_US (through .po files)?
The most simple option that comes to my mind would be to inject the current locale into your existing table class, and have it set the database table name accordingly.
Let's assume your existing table class would be called SomeSharedTable, this could look something along the lines of:
// ...
class SomeSharedTable extends Table
{
public function initialize(array $config)
{
if (!isset($config['locale'])) {
throw new \InvalidArgumentException('The `locale` config key is missing');
}
$table = 'en_table';
if ($config['locale'] === 'pl_PL') {
$table = 'pl_table';
}
$this->table($table);
// ...
}
// ...
}
And before your appplication code involves the model layer, and after it sets the locale of course (that might for example be in your bootstrap), configure the alias that you're using throughout your application (for this example we assume that the alias matches the table name):
\Cake\ORM\TableRegistry::config('SomeShared', [
'locale' => \Cake\I18n\I18n::locale()
]);
Given that it's possible that the locale might not make it into the class for whatever reason, you should implement some safety measures, I've just added that basic isset() check for example purposes. Given that a wrongly configured table class could cause quite some problems, you probably want to add some checks that are a little more sophisticated.
Ok, maybe this is too broad for StackOverflow, but is there a good, generalized way to assemble data in relational tables into hierarchical JSON?
For example, let's say we have a "customers" table and an "orders" table. I want the output to look like this:
{
"customers": [
{
"customerId": 123,
"name": "Bob",
"orders": [
{
"orderId": 456,
"product": "chair",
"price": 100
},
{
"orderId": 789,
"product": "desk",
"price": 200
}
]
},
{
"customerId": 999,
"name": "Fred",
"orders": []
}
]
}
I'd rather not have to write a lot of procedural code to loop through the main table and fetch orders a few at a time and attach them. It'll be painfully slow.
The database I'm using is MS SQL Server, but I'll need to do the same thing with MySQL soon. I'm using Java and JDBC for access. If either of these databases had some magic way of assembling these records server-side it would be ideal.
How do people migrate from relational databases to JSON databases like MongoDB?
Here is a useful set of functions for converting relational data to JSON and XML and from JSON back to tables: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
SQL Server 2016 is finally catching up and adding support for JSON.
The JSON support still does not match other products such as PostgreSQL, e.g. no JSON-specific data type is included. However, several useful T-SQL language elements were added that make working with JSON a breeze.
E.g. in the following Transact-SQL code a text variable containing a JSON string is defined:
DECLARE #json NVARCHAR(4000)
SET #json =
N'{
"info":{
"type":1,
"address":{
"town":"Bristol",
"county":"Avon",
"country":"England"
},
"tags":["Sport", "Water polo"]
},
"type":"Basic"
}'
and then, you can extract values and objects from JSON text using the JSON_VALUE and JSON_QUERY functions:
SELECT
JSON_VALUE(#json, '$.type') as type,
JSON_VALUE(#json, '$.info.address.town') as town,
JSON_QUERY(#json, '$.info.tags') as tags
Furhtermore, the OPENJSON function allows to return elements from referenced JSON array:
SELECT value
FROM OPENJSON(#json, '$.info.tags')
Last but not least, there is a FOR JSON clause that can format a SQL result set as JSON text:
SELECT object_id, name
FROM sys.tables
FOR JSON PATH
Some references:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
https://learn.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/05/json-in-sql-server-2016-part-1-of-4/
https://www.red-gate.com/simple-talk/sql/learn-sql-server/json-support-in-sql-server-2016/
I think one 'generalized' solution will be as follows:-
Create a 'select' query which will join all the required tables to fetch results in a 2 dimentional array (like CSV / temporary table, etc)
If each row of this join is unique, and the MongoDB schema and the columns have one to one mapping, then its all about importing this CSV/Table using MongoImport command with required parameters.
But a case like above, where a given Customer ID can have an array of 'orders', needs some computation before mongoImport.
You will have to write a program which can 'vertical merge' the orders for a given customer ID.For small set of data, a simple java program will work. But for larger sets, parallel programming using spark can do this job.
SQL Server 2016 now supports reading JSON in much the same way as it has supported XML for many years. Using OPENJSON to query directly and JSON datatype to store.
There is no generalized way because SQL Server doesn’t support JSON as its datatype. You’ll have to create your own “generalized way” for this.
Check out this article. There are good examples there on how to manipulate sql server data to JSON format.
https://www.simple-talk.com/blogs/2013/03/26/sql-server-json-to-table-and-table-to-json/