Creating Apache Ignite Cluster - database

I am trying to create an Apache Ignite Cluster lately and is pretty new to this. I am facing a couple of issues when creating it and migrating an old database to Apache Ignite.
These are 2 issues which am facing now and is looking forward for inputs on that.
I want to add Auto Incrementing Primary Key in the table. Is there any way to directly implement that in the database? I have seen that Apache Ignite doesn't support Auto Incrementing. I know we can add Auto Increment functionality from code, is there any way to implement it in the database itself.
Adding the Affinity Keys. I have another field that i want to use as the Affinity Key and as far as i have seen, we can only add keys as Affinity Keys in Apache Ignite. The solution that I got is to use the second field as a Secondary Key. Is there any better solutions for this?
If you have any better solutions for these issues, please let me know.
Thanks and Regards

It’s not really possible to have an auto increment field in a distributed system without dramatically slowing it down. You can do it manually with IgniteAtomicSequence or IgniteAtomicLong. Or maybe use something like a UUID
I’m not sure I understand your question. The affinity key needs to be part of the key. The whole key is unique, the affinity key tells Ignite which node the data should be stored on. More here: https://www.gridgain.com/docs/latest/developers-guide/data-modeling/affinity-collocation

Related

Symfony2 Doctrine Use Cache Tables

For the project I'm working on, we have a fully normalized database where no information is redundant.
I'd like to keep this method, but also add "cache" tables, which are essentially tables which have pre-computed information. I'd love to be able to have this information in separate tables (which could then be blown away and regenerated as needed).
For example, part of this involves a forum. One "cached" value would be the number of posts a user has made. There is no need to keep this in any of the normalized tables, because it can be calculated based on a count of posts linked with that user. However, this is a (relatively) expensive call, so the cache table would keep track of this value for me and I can pull from it as needed.
I'm also strongly considering using a NoSQL database like MongoDB for this, because the cached tables would essentially have no joins or foreign keys (making it perfect for MongoDB).
Any ideas how I should approach this using Doctrine in Symfony2? Anyone done this before?
Thanks a ton!
Update
As greg0ire comments, it looks like Doctrine has some built in caching functionality: http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/caching.html
Does anyone know if I can employ this to cache my values without storing them in the database?
For example, if I had an unmapped property $postCount, can I use Doctrine to cache that value (or I guess, the object with that value populated)?
The only problem with this approach (caching to memory instead of a database), is we're working in a clustered environment, so I'd either have to build the cache multiple times (each server the user hits), or set get a shared caching server set up (which is a bit tricky).
I'll continue to investigate this route, but does anyone know of any database stored methods?
Thanks.
I think you may be looking for Doctrine's result cache
Here is the related part of the sf2 configuration.

PIG Latin script for Database access

I am trying to implement a surrogate key generator using PIG.
I need to persist the last generated key in a Database and query the Database for the next available key.
Is there any support in PIG to query the Database using ODBC?
If yes, please provide guidance or some samples.
Sorry for not answering your question directly, but this is not something you want to be doing. For a few reasons:
Your MapReduce job is going to hammer your database as a single performance chokepoint (you are basically defeating the purpose of Hadoop).
With speculative execution, you'll have the same data get loaded up twice so some unique identifiers won't exist when one of the tasks gets killed.
I think if you can conceivably hit the database once per record, you can just do this surrogate key enrichment without MapReduce in a single thread.
Either way, building surrogate keys or automatic counters is not easy in Hadoop because of the shared-nothing nature of the thing.

Autoincrement with Grails/Hibernate for different DBs

it's not a realy problem, but it surprises me:
when I use Grails with diffrent DBs, I get different counter increments...:
with the ootb hsqldb, every table gets its own counter which is always increased by 1
with an oracle db, it seems that all tables use the same global counter
now I am using javadb/derby and the generated id are huge!
where can I find some more information about this behaviour and which one is the best?
hsql seems to keep the counters small
with oracle, I get a global unique id - also a nice feature
but what about the derby behaviour?
It really depends on the default id generation strategy in the specific dialect. Grails allows you to customize the generation strategy with mapping closure.
The most 'safe' (i.e. being supported by every RDBMS) generation strategy is TABLE, and this is preferred choice of many JPA implementations. This is probably what you get in HSQLDB. However, Oracle support sequences and these objects are generally better optimized for handling key generation -- hence, the dialect for Oracle seems to use one global sequence. I'm not familiar with Derby, but probably there is identity column support there and what you get is some sort of UUID.

Can I use RavenDB (NoSQL) or should I just be using MySQL(RDBMS)?

I am starting on a ASP.NET MVC 3 General Management System (Project Management being the first component). Now I have been reading up a bit on RavenDB and it sounds pretty interesting. One of the biggest things that I like about it is the fact I would not need any type on ORM to handle the data from the DB. This will make my code a lot cleaner and quicker. However coming from a background working exclusively with MySQL for the past 6+ years, I tend to think very relationally with my data. There are a few things that seems like NoSQL would not be good for. I want to throw these things out there and maybe these issues can be handle in a NoSQL solution and I am just think too relationally (then again, maybe this project should be done with MySQL). These are the issues I am thinking of:
Unique Idenifiers: I am going to want to be able to have unique identifiers for a lot of things. For stuff like projects, the name should be unique and could use that however when it come to tasks under a project, the title may not be unique and this is where I would use a quto-increment field but I can do that in RavenDB (from what I can tell)
Linking: Using for fields like status and type I would just use a linking with a foreign key. Now for one-to-many relationships, I can just use the text instead of trying to link a foreign key (which you don't have in NoSQL) but with many-to-many linking, that because a problem. For example, I intend to have a tagging system (like on here) where most items can have 1 to many tags attached to it and then I can perform searches on those tag for the items. Is there a way to do this in NoSQL?
Is a RDBMS really the best tool for the job here or am I just not properly think the "NoSQL" way and I can accomplish this with NoSQL (RavenDB)?
I know this is an old post. Perhaps the docs weren't as good when originally written. But for reference in case other stumble here:
Raven comes with a HiLo document id generation strategy by default. Storing a new document without specifying an id yourself will get an auto incrementing id such as "projects/1", "projects/2", etc. Read more here.
The best guidance on the different ways to handle document relationships is here in the documentation. For the situation you described, you don't really need a separate document at all. You can simply embed a string array of tag names into each item. Documents are not flat, they can be structured. And yes, you can still query on them.
Hopefully you've discovered this on your own since the original post.
Ayende wrote a post "Modeling reference data in RavenDB" which answers some of your questions re Linking. You will have copies of the data between the reference document and your other documents and that redundancy is "ok" for document databases. You can still build indexes or query based on the on either Id or text that you store.
I would favor SQL for a transaction system such as Accounts Receivable application where you need to perform ad hoc queries. With document database you really need to think through how you will be fetching your data and build indexes up front to answers those questions. With RavenDB there is also a dynamic indexing function that learns from and caches the queries that are fired at the database.
For project management where the majority of items would be tasks I would think a RavenDB would fit your needs.

Loosely Coupled Database Design - How To?

I'm implementing a web - based application using silverlight with an SQL Server DB on the back end for all the data that the application will display. I want to ensure that the application can be easily scalable and I feel the direction to go in with this is to make the database loosely coupled and not to tie everything up with foreign keys. I've tried searching for some examples but to no avail.
Does anyone have any information or good starting points/samples/examples to help me get off the ground with this?
Help greatly appreciated.
Kind regards,
I think you're mixing up your terminology a bit. "Loosely coupled" refers to the desirability of having software components that aren't so dependent upon each other that they can't function or even compile without being together in the same program. I've never seen the term used to describe the relationships between tables in the same database.
I think if you search on the terms "normalization" and "denormalization" you'll get better results.
Unless you're doing massive amounts of inserts at a time, like with a data warehouse, use foreign keys. Normalization scales like crazy, and you should take advantage of that. Foreign keys are fast, and the constraint really only holds you back if you're inserting millions upon millions of records at a time.
Make sure that you're using integer keys that have a clustered index on them. This should make joining table very rapid. The issues you can get yourself wrapped around without foreign keys are many and frustrating. I just spent all weekend doing so, and we made a conscious choice to not have foreign keys (we have terabytes of data, though).
Before you even think of such a thing, you need to think about data integrity. Foreign keys exist so that you cannot put records into tables if the primary data they are based on is not there. If you do not use foreign keys, you will sooner or later (probably sooner) end up with worthless data because you don't really know who the customer is that the order is attached to for instance. Foreign keys are data protection, you should never consider not using them.
And even though you think all your data will come from your application, in real life, this is simply not true. Data gets in from multiple applications, from imports of large amounts of data, from the query window (think about when someone decides to update all the prices they aren't going to do that one price at a time from the user interface). Data can get into database from many sources and must be protected at the database level. To do less is to put your entire application and data at risk.
Intersting comment about database security when data is input through external sources like database scripts.

Resources