I want to create an application similar to a library.
A little bit of context, I have hundreds of books, and I want to keep a record of what I have, which tome ...
I eventually want to create a website and let other people record their collections.
Now for the technical part, I want to use mongoDB because I kinda like the project, I never used a noSQL type database and i like the JSON format. I still didn't look much into the mongoDB doc.
Also, important to remind, I can own many books, and a same book can be owned by many people.
Here is what the database would like(still WIP):
------------------------------------------------
User
------------------------------------------------
+ id
+ first_name
+ first_name
+ mail
+ language
+ etc.
------------------------------------------------
------------------------------------------------
Book
------------------------------------------------
+ id
+ title
+ volume_number
+ edition
+ demographic
+ language
+ condition
+ isbn
+ editor
+ etc.
------------------------------------------------
------------------------------------------------
editor
------------------------------------------------
+ id
+ name
+ etc.
------------------------------------------------
------------------------------------------------
demographic
------------------------------------------------
+ id
+ name
+ etc.
------------------------------------------------
------------------------------------------------
languages
------------------------------------------------
+ id
+ name
+ etc.
------------------------------------------------
------------------------------------------------
authors
------------------------------------------------
+ id
+ first_name
+ first_name
+ etc.
------------------------------------------------
or in JSON like
Now my interrogations:
How should I link a book to an user ? The easy way would be to put everything under the user collection, but the thing is if for exemple i have all the Game of Throne books, there is a high possibility other people have the books too, we will repeat the same book x times(they have the same title, isbn, author, etc, everything is the same expect the condition(new, used ...)) for each user. Not sure if it's good.
The user will also have the possibility to remove a book from his collection, if he sold it for example.
I feel like if I put every book under the user document, i can use mongoDB, but if not, I don't know ...
Here an example:
var user = {
id: ObjectId("1"),
identity: {
first: "Alan",
last: "Turing",
mail:"eichiro.oda#japan.com",
language: "1"
},
collections: [
mangas: [
{
id: ObjectId("2"),
title: "One Piece",
volume: 1,
edition: 5, -> reference to edition document below
.
.
.
.
},
{
id: ObjectId("3"),
title: "One Piece",
volume: 1,
edition: 6, -> reference to edition document below
.
.
.
.
}
],
comics: [
{
{
id: ObjectId("4"),
title: "Spider-man",
volume: 1,
edition: 'Limited',
.
.
.
.
},
}
]
]
}
var edition = {
{
id: ObjectId("5"),
title: "Collector"
},
{
id: ObjectId("6"),
title: "Classic"
}
}
Is there another way to do that ? My other thought would be to put everything under the book collection, but i'm not sure how to do it since a user can sell his book.
var book = {
{
title: "One Piece",
volume: 1,
edition: 6, -> reference to edition document below
.
.
.,
ownedBy: [
{
id: ObjectId("2"),
identity: {
first: "Alan",
last: "Turing",
mail: "eichiro.oda#japan.com",
language: "1"
},
},
{
id: ObjectId("3"),
identity: {
first: "Alan",
last: "Turing",
mail: "eichiro.oda#japan.com",
language: "1"
}
}
]
}
}
For the book ID, people usually used the ISBN number, but i read on anther stack overflow thread that some books have the same ISBN but aren't the same book
Is mongoDB good for what i want to do ? I honestly didn't look into the others databases type, because i wanted to try mongo, but if something more optimized exist, i will go for the optimized version. Note that i really doubt that the database will contains millions/billions of entries.
Do you have a schema recommendation ? It's my 1st time creating a database, and I want something who will last long, i don't want to have a recreate it in a few years if possible.
Performance side, is there a difference if I put everything under the user document or under the book document ?
Here are some of the threads i found, but they are quite old now =/
Database schema for a library
A library database
Thanks for reading my bad english and thanks for any help you will be able to provide.
Related
I made a web scrapper which goes to many predefined online stores and collects products data then save them in a structured way in my database. something like this:
{
title: "Samsung Galaxy S20 5G",
price: 1250,
description: "some text here",
brand: "Samsung"
specification: [
ram: 12GB,
Capacity: 128GB
]
sellerId: 2,
category: "smartphones"
internalProductId: null
}
I want to use this data to make a price comparison website.
something like this website
my problem is I don't know how to map/connect/match these collected products to my original product in my website. so their seller and price is shown under that original product. (Here that internalProductId is id of original product in my website.)
for example I have this product in my website:
Samsung A71 128GB/8Gb id: 1
when I crawled several online store I want to use a technology/algorithm to analyze collected products, recognize similar products that match to my products and give them appropriate internalProductId.
I'm interested to any accurate solution any matching algorithm any technology any suggestion. (I'm not interested to machine learning solutions)
What I want to do is avoid data duplication (embedded), but also avoid fields relations by IDs.
Example:
{ Books: [
{ id: 1, name: "Foo1", author: referenceToTheAuthorWithId1 },
{ id: 2, name: "Foo2", author: referenceToTheAuthorWithId1 }
]}
{ Authors: [
{id: 1, name: "Bar"}
]}
So each time I want to get a book, it will also retrieve its author, but that author object would be a reference to the Author collection.
You can manually set _id fields for all of the documents you insert, and you can have identical ids in documents in various collections that refer to some common data.
In your example, if you replaced Authors with Authorships it would make some sense to have authorship id be the same as book id (at least this schema would support an author writing multiple books).
However, such a schema does not support multiple authors per book.
Recently I moved my data model from Firebase to Firestore. All my code is working, but I'm having some ugly troubles regarding my nested queries for retrieve some data. Here is the point:
Right now my data model for this part looks like this(Yes! Another followers/feed example):
{
"Users": { //Collection
"UserId1" : { //Document
"Feed" : { //Subcollection of Id of posts from users this user Follow
"PostId1" : { //Document
"timeStamp" : "SomeDate"
},
"PostId2" : {
"timeStamp" : "SomeDate"
},
"PostId3" : {
"timeStamp" : "SomeDate"
}
}
//Some data
}
},
"Posts":{ //Collection
"PostId1":{ //Document
"Comments" :{ //Subcollection
"commentId" : { //Document
"authorId": "UserId1"
//comentsData
}
},
"Likes" : { //Subcollection
"UserId1" : { //Document
"liked" : true
}
}
}
}
}
My problem is that for retrieve the Posts of the feed of an user I should query in the next way:
Get the last X documents orderer by timeStamp from my Feed
feedCol(userId).orderBy(CREATION_DATE, Query.Direction.DESCENDING).limit(limit)
After that I should do a single query of each post retrieved from the list: workoutPostCol.document(postId)
Now I have the data of each post, but I want shot the username, picture, points.. etc of the author, which is in a different Document, so, again I should do another single query for each authorId retrieved in the list of posts userSocial(userId).document(toId)
Finally, and not less important, I need to know if my current user already liked that post, so I need to do a single query for each post(again) and check if my userId is inside posts/likes/{userId}
Right now everything is working, but thinking that the price of Firestore is depending of the number of database calls, and also that it doesn't make my queries more simple, I don't know if it's just that my data model is not good for this kind of database and I should move to normal SQL or just back to Firebase again.
Note: I know that EVERYTHING, would be a lot more easier moving this subcollections of likes, feed, etc to arraylists inside my user or post documents, but the limit of a Document is 1MB and if this grow to much, It will crash in the future. In other hand Firestore doesnt allow subdocument queries(yet) or an OR clause using multiple whereEqualTo.
I have read a lot of posts from users who have problems looking for a simple way to store this kind of ID's relationship to make joins and queries in their Collections, use Arraylists would be awesome, but the limit of 1MB limit it to much.
Hope that someone will be able to clarify this, or at least teach me something new; maybe my model is just crap and there is a simple and easiest way to do this? Or maybe my model is not possible for a non-sql database.
Not 100% sure if this solves the problem entirely, since there may be edge cases for your usage. But with a 5 min quick thinking, I feel like the following could solve your problem :
You can consider using a model similar to Instagram's. If my memory serves me well, what they use is an events-based collection. By events in this specific context I mean all actions the user takes. So a comment is an event, a like is an event etc.
This would make it so that you'll need three main collections in total.
users
-- userID1
---- userdata (profile pic, bio etc.)
---- postsByUser : [postID1, postID2]
---- followedBy : [userID2, ... ]
---- following : [userID2, ... ]
-- userID2
---- userdata (profile pic, bio etc.)
posts
-- postID1 (timestamp, so it's sortable)
---- contents
---- author : userID1
---- authorPic : authorPicUrl
---- authorPoints : 12345
---- taggedUsers : []
---- comments
------ comment1 : { copy of comment event }
---- likes : [userID1, userID2]
-- postID2 (timestamp)
---- contents
...
events
-- eventID1
---- type : comment
---- timestamp
---- byWhom : userID
---- toWhichPost : postID
---- contents : comment-text
-- eventID2
---- type : like
---- timestamp
---- byWhom : userID
---- toWhichPost : postID
For your user-bio page, you would query users.
For the news feed you would query posts for all posts by userIDs your user is following in the last 1 day (or any given timespan),
For the activity feed page (comments / likes etc.) you would query events that are relevant to your userID limited to the last 1 day (or any given timespan)
Finally query the next days for posts / events as the user scrolls (or if there's no new activity in those days)
Again, this is merely a quick thought, I know the elders of SOF have a habit of crucifying these usually, so forgive me fellow members of SOF if this answer has flaws :)
Hope it helps Francisco,
Good luck!
Every time i contemplate using NoSQL for a solution i always get hung up on the lack of rich querying functionality. I think it very well be my lack of understanding of NoSQL. It also might be due to the fact of i'm comfortable very comfortable with SQL. From my understanding NoSQL really lends itself well for simple schema scenarios (so its probably not going to work well for a relational database where you have 50+ tables). Even for trivial scenarios i always seem to want rich query functionality. Lets take a recipe database as a trivial example.
While the scheme, is no doubt, trivial you would definitely want rich querying ability. You would probably want to search by the following (and more):
Title
Tag
Category
id
Likes
User who created recipe
create date
rating
dietary restrictions
You would also want to combine these criteria into any combination you wanted to. While i know most NoSQL solutions have secondary indexes doesn't this type of querying ability severely limit how many solutions NoSQL is relevant for? I usually need this rich querying ability. Another good example would be a bug tracking application.
I don't think you want to kick off a map reduce job every time wants to search the database (i think this would be analogous to doing table scans most of the time in a traditional relational model). So i would assume there would be a lot of queries where you would have to loop through each entity and look for the criteria you wanted to search for (which would probably be slow). I understand you can run nightly map reduce jobs to either analyze the data or to maybe normalize it into a typical relational database structure for reports.
Now i can see it being useful for scenarios where you would most likely always have to read all the data anyways. Think of a web server log or maybe an IoT type of app where your collecting massive amounts of data (like censor collection) and doing nightly analysis.
So is understanding of NoSQL off or is there a limit to the # of scenarios that i works well with?
I think the issue you are encountering is that you are approaching noSQL with the same mindset of design that you would with SQL. You mentioned "rich querying" several times. To me, that points towards design flaws (using only reference ids/trying to define relationships). A significant concept in noSQL is that data can be repeated (and often should be). Your recipe example is actually a great use cases for noSQL. Here's how I would approach it using 3 of the models you mention (for simplicity sake):
Recipe = {
_id: a001,
name: "Burger",
ingredients: [
{
_id: b001,
name: "Beef"
},
{
_id: b002,
name: "Cheese"
}
],
createdBy: {
_id: c001,
firstName: "John",
lastName: "Doe"
}
}
Person = {
_id: c001,
firstName: "John",
lastName: "Doe",
email: "jd#email.com",
preferences: {
emailNotifactions: true
}
}
Ingredient = {
_id: b001,
name: "Beef",
brand: "Agri-co",
shelfLife: "3 days",
calories: 300
};
The reason I designed it this way is expressly for the purpose of it's existence (assuming it's something like allrecipes.com). When searching/filtering recipes, you can filter by the author, but their email preferences are irrelevant. Similarly, the shelf life and brand of the ingredient are irrelevant. The schema is designed for the specific use-case, not just because your data needs to be saved. Now here are a few of your mentioned queries (mongo):
db.recipes.find({name: "Burger"});
db.recipes.find({ingredients: { $nin: ["Cheese", "Milk"]}}) // dietary restrictions
Your rich querying concerns have now been reduced to single queries in a single collection.
The downside of this design is slower write speed. You need more logic on the backend, with the potential for more programmer error. The write speed is also slower than SQL due to accessing the various models to grab relevant information. That being said, how often is it viewed vs. how often is it written/edited? (this was my comment on reading trumping writing) The other major downside is the necessity of foresight. The relationship between an ingredient and a recipe doesn't change forms. But the information your application requires might. Editing a noSQL model tends to be more difficult than editing a SQL table.
Here's one other contrived example using the same models to emphasize my point about purposeful design. Assume your new site is on famous chefs instead of a recipe database:
Person = {
_id: c001,
firstName: "Paula",
lastName: "Deen",
recipeCount: 15,
commonIngredients: [
{
_id: b001,
name: "Butter",
count: 15
},
{
_id: b002,
name: "Salted Butter",
count: 15
}
],
favoriteRecipes: [
{
_id: a001,
name: "Fried Butter",
calories: "3000"
}
]
};
Recipe = {
_id: a001,
name: "Fried Butter",
ingredients: [
{
_id: b001,
name: "Butter"
}
],
directions: "Fry butter. Eat.",
calories: "3000",
rating: 99,
createdBy: {
_id: c001,
firstName: "Paula",
lastName: "Deen"
}
};
Ingredient = {
_id: b001,
name: "Butter",
brand: "Butterfields",
shelfLife: "1 month"
};
Both of these designs use the same information, but they are modeled for the specific reason you bothered gathering the information. Now, you have the requisite information for a chef list page and typical sorting/filtering. You can navigate from there to a recipe page and have that info available.
Design for the use case, not to model relationships.
Imagine you plan to create a social network running on GAE/Java where each user has a set of properties (i.e. age, current town, interests).
Alternative 1: classical approach - the user_id and every property as a "row"
entity property_1 property_2 property_3
------ ---------- ---------- -----------------
bob missing NY [football, books]
tom 34 missing [books, horses]
Alternative 2: entity-atributte-value (EAV)
entity attribute value
------ --------- -----
bob town NY
bob interests [football, books]
tom age 34
tom interests [books, horses]
What pros/cons do you think each option has? My main concerns are:
What is the impact on multi-criteria searches (i.e. "give me the users with ages under 45 that live in NY and like books")
What GAE/J implications could it have? (i.e. indexes, datastore size...)
How to model attributes with multiple values ("interests" for example) if you want to retrieve "users that like books" ?
I think the second alternative is more flexible and maybe easier to implement, but I would like to know what other experienced developers think.
Thank you.
Did you have a look at Building Scalable, Complex Apps on App Engine from Google I/O 2009? The video has terrible sound-quality, but it covers your topics. He talks about list properties and merge-joins and their limitations.
If the flexibility of EAV is essential for your app then use it, otherwise do not since it'll have pitfalls in querying.
Will return all entities that have books in interests:
final Iterator<EAV> eavs = Iterators.transform(
datastoreService.prepare(
new Query(EAV.class.getSimpleName()).addFilter("a",
FilterOperator.EQUAL, "interests").addFilter(
"v", FilterOperator.EQUAL, "books"))
.asIterator(), new Function<Entity, EAV>() {
#Override
public EAV apply(final Entity input) {
return new EAV(input);
}
});
while (eavs.hasNext()) {
logger.debug("eav: " + eavs.next());
}
Trying to fetch entities that have books in interests and aged under 45, but won't result anything since no row will have the two values of a and v:
final Iterator<EAV> eavs = Iterators.transform(
datastoreService.prepare(
new Query(EAV.class.getSimpleName()).addFilter("a",
FilterOperator.EQUAL, "interests").addFilter(
"v", FilterOperator.EQUAL, "books").addFilter("a",
FilterOperator.EQUAL, "age").addFilter(
"v", FilterOperator.LESS_THAN, 45))
.asIterator(), new Function<Entity, EAV>() {
#Override
public EAV apply(final Entity input) {
return new EAV(input);
}
});
while (eavs.hasNext()) {
logger.debug("eav: " + eavs.next());
}
The result isn't surprising, as the querying in big table is not even close to the flexibility of SQL (no joins for example). The working solution would probably be multiple queries and manually combining and parsing their results.
OTOH with "classical approach" it's trivial:
final Iterator<Person> persons = Iterators.transform(
datastoreService
.prepare(
new Query(Person.class.getSimpleName())
.addFilter("interests",
FilterOperator.EQUAL, "books")
.addFilter("age",
FilterOperator.NOT_EQUAL, null)
.addFilter("age",
FilterOperator.LESS_THAN, 45))
.asIterator(), new Function<Entity, Person>() {
#Override
public Person apply(final Entity input) {
return new Person(input);
}
});
while (persons.hasNext()) {
logger.debug("person: " + persons.next());
}
This will print out tom's data.