Graph database schema design - Is this suitable for neo4j?

Graph database schema design - Is this suitable for neo4j? - database

Scenario: a simple address book where a user can create his own contacts and organize them by adding them in groups. A contact may have multiple addresses.
I have created the following diagram:
![schema-design][1]
I want to query all the contacts who are placed in group x and live in country y.
Is this schema design good enough for those purposes (I want to use the neo4j database)?

It looks like the notion of country should be a first class citizen in your graph since your query depends on it. Graph model design typically gets influenced a lot by your query patterns.
So I suggest to have a node labeled Country for each country and connect the Address node with :LOCATED_IN relationships to the country. (consequently drop the country property from the address nodes).
With that change your query is as easy as:
MATCH (:Group{name:'family'})<-[:placed_in_group]-(contact)-[:lives-at]->()-[:LOCATED_IN]->(:Country{name:'US'})
RETURN contact

One option is to have another node from address for country, as pointed out by stefan Armbruster. If you do not want to change the data structure, just add an index to the field "country" of Address. Then you can have a query
MATCH (:Group{name:'family'})<-[:placed_in_group]-(contact)-[:lives-at]->(:Address{country:'US'})

Related

NoSql - entity holds an owner ID field vs owner holds list of child ID's

I am currently exploring MongoDB.
I built a notes web app and for now the DB has 2 collections: notes and users.
The user can create, read and update his notes.
I want to create a page called /my-notes that will display all the notes that belong to the connected user.
My question is:
Should the notes model has an ownerId field or the opposite - the user model will have a field of noteIds of type list.
Points I found relevant for the decision making:
noteIds approach:
There is no need to query the notes that hold the desired ownerId (say we have a lot of notes then we will need indexes and search accross the whole notes collection). We just need to find the user by user ID and then get all the notes by their IDs.
In this case there are 2 calls to DB.
The data is ordered by the order of insertion to the notesIds field in the document.
ownerId approach:
We do need to find the notes by their ownerId field across the notes collection which might be more computer "intensive".
We can paginate / sort the data as we want - more control over the data.
Are there any more points you can think of?
As I can conclude this is a question of whether you want less computer intensive DB calls vs more control over the data.
What are the "best practices"?
Thanks,

A similar use case is explained in the documentation. If there is no limit on number of notes a user can have, it might be better to store a userId reference field in notes document.
As you've figured out already, pagination would be easier in the second approach. Also when updating notes, you can simply updateOne({ _id: "note_id", userId: 1 }) instead of checking user's document if the note actually belong to the user.

How to structure a database with Firebase

I'm in the process where I need to store data under a specific users account with Firebase. I'm fairly new working with the backend and I'm primarily looking for a second option before I start typing away.
This is how I currently have it structured:
Providers
City
Cincinnati
Company 1
Jobs
History
Company 2
Jobs
History
Columbus
Company 1
Company 2
My thought is it would be better to have each company listed within a specific city node when a user is requesting from that city. However, if the city was at the first level then we'd have to go through the city first. But instead of having the city at the higher level we can store the cities their in under the companies node essentially in an array.
So here's the user flow:
The user makes a request based on their location (city). The provider accepts that specific job which moves into the 'jobs' node. Once the job is complete it then moves into their 'history' node.
My question:
Should I keep the structure as is or place the city within the companies node?

Your structure is right, as city is your primary key in this case, since a company has a city, you will need to specify first the city where you want to pull companies from. I assume that this structure is the right way to go to solve your problem.
If your parent node is not City and just the name of the cities you will end up with a lot of children in your main tree, and that is what you don't want, you want to structure your database in order to do it readable when data is big enough.
PS: Think in scalability, if you think this way you end up with a good organized database, now, as you are doing it it's OK, but if you have another doubt, think in this way to improve your structure.

filemaker database relationships

I'm very new to FileMaker currently working on a Mac. I've been assigned a new simple system to work towards completing and I have bumped into some issues with database relationships. I've got experience with PHP/MySQL databases connections etc. but FileMaker seems to require a somewhat different mindset and approach.
I'll try to explain this as simply as I can.
Here's the table relationships in my database
What I'm trying to do is a list of "to-do" notes, an interactive menu where the user can add things that needs to be done. I've done this with a portal on a layout based on the table "site". The portal is based on the table "todo_notes", which is connected to site through the "site_id".
Here's what it looks like in browse mode
What I'm having problems with is adding a relationship between the todo_notes and contacts. The contacts are two separate tables called "county_contacts" and "property_owner_contacts". What I want to accomplish is the possibility for the user to, from a dropdown-list, add a single contact from these two tables. Preferably I'd like to sort of merge these two tables into the same dropdown-list.
Let me know if you need any other information or a better explanation of my issue. Any help is very welcome!

If you have a single contacts table with foreign keys for both county and property owner tables, that would let you have a single list for all contacts. From there you could also build a value list based on a relationship, for example to filter only contacts that belong to either county or property owners.
If you then need to further normalize the tables, fields that pertain to either relationship exclusively could be moved to another table from there, as a one to one relationship, if that is a concern.

The Short Answer
You need to create a Contacts table. Filemaker has no way of dynamically generating value lists. Instead, you can base a value list on any field, therefore, the only way of generating a list of the contact names would be if they were all in the same table.
The Long Answer
Because Filemaker only allows us to use ONE field for a value list, we must create a new table for the contact. I would recommend that you replace the two contact tables with a single contact table,(seeing as the fields look the same between the two tables) and then add a toggle on the contact for Owner or County. However, you could also create a single contact table for all of the fields that overlap that has foreign keys to the owner and county tables.
You would then use the fullname field from the contact and be good to go.
That is, assuming that you did not want to filter the contacts at all or only show contacts associated with this site.
To start with, I highly recommend using the Anchor-buoy method for organizing the relationship graph. Here's an explanation of the anchor-buoy method: http://sixfriedrice.com/wp/six-fried-rice-methodology-part-2-anchor-buoy-and-data-structures/ . It's just a convention, but will help you with the idea of context in FileMaker. It's widely accepted among the FileMaker community as the "right" way to organize a relationship graph. I will continue my explanation using this method.
Each Table Occurrence (the boxes in the graphs, or TO) represents a unique context from which you can view and edit information. In the anchor buoy method, each Table only has one "anchor" TO. I would recommend only using anchor TO's for the context of your layouts. Then, your portal, and any other corresponding information, will be on your buoy TO's. Here is what your new portal relationship would look like. You would select fields from your buoy TO's to use in the portal.
The easiest way to filter your value list by only contacts associated with this site would be to create a foreign key from the contact table to the site, and then add a TO to the graph, for the contact table. You would then click "Include only related values starting from" radio button, and specify your new TO.

Denormalization of database tables for Lucene indexing

I am just starting up with Lucene, and I'm trying to index a database so I can perform searches on the content. There are 3 tables that I am interested in indexing:
1. Image table - this is a table where each entry represents an image. Each image has an unique ID and some other info (title, description, etc).
2. People table - this is a table where each entry represent a person. Each person has a unique ID and other info like (name, address, company, etc)
3. Credited table - this table has 3 fields (image, person, and credit type). It's purpose is to associate some people to a image as the credits for that image. Each image can have multiple credited people (there's the director, photographer, props artist, etc). Also, a person is credited in multiple images.
I'm trying to index these tables so I can perform some searching using Lucene but as I've read, I need to flatten the structure.
The first solution the came to me would be to create Lucene documents for each combination of Image/Credited Person. I'm afraid this will create a lot of duplicate content in the index (all the details of an image/person would have to be duplicated in each Document for each person that worked on the image).
Is there anybody experienced with Lucene that can help me with this? I know there is no generic solution to denormalization, that is why I provided a more specific example.
Thank you, and I will gladly provide more info on the database is anybody needs
PS: Unfortunately, there is no way for me to change the structure of the database (it belongs to the client). I have to work with what I have.

You could create a Document for each person with all the associated images' descriptions concatenated (either appended to the person info or in a separate Field).
Or, you could create a minimal Document for each person, create a Document for each image, puts the creators' names and credit info in a separate field of the image Document and link them by putting the person ID (or person Document id) a third, non-indexed field. (Lucene is geared toward flat document indexing, not relational data, but relations can be defined manually.)
This is really a matter of what you want to search for, images or persons, and whether each contains enough keywords for search to function. Try several options, see if they work well enough and don't exceed the available space.
The credit table will probably not be a good candidate for Document construction, though.

Best approach to views on archive data with change logs

(Sorry about the vagueness of the title; I can't think how to really say what I'm looking for without writing a book.)
So in our app, we allow users to change key pieces of data. I'm keeping records of who changed what when in a log schema, but now the problem presents itself: how do I best represent that data in a view for reporting?
An example will help: a customer's data (say, billing address) changed on 4/4/09. Let's say that today, 10/19/09, I want to see all of their 2009 orders, before and after the change. I also want each order to display the billing address that was current as of the date of the order.
So I have 4 tables:
Orders (with order data)
Customers (with current customer data)
CustomerOrders (linking the two)
CustomerChange (which holds the date of the change, who made the change (employee id), what the old billing address was, and what they changed it to)
How do I best structure a view to be used by reporting so that the proper address is returned? Or am I better served by creating a reporting database and denormalizing the data there, which is what the reports group is requesting?

There is no need for a separate DB if this is the only thing you are going to do. You could just create a de-normalized table/cube...and populate and retrieve from it. If your data is voluminous apply proper indexes on this table.

Personally I would design this so you don't need the change table for the report. It is a bad practice to store an order without all the data as of the date of the order stored in a table. You lookup the address from the address table and store it with the order (same for partnumbers and company names and anything that changes over time.) You never get information on an order by joining to customer, address, part numbers, price tables etc.
Audit tables are more for fixing bad changes or looking up who made them than for reporting.