News Database Design for a Desktop Application - sql-server

I am making an application that will process a big chunk of information. This information was retrieved using some web crawlers and is about news, containing data such as News Title, URL, Publication Date, Category and Content. The crawled data is in XML format, and I will load that information into my application.
From there, 10 users will process every news and tag the category of the news manually by reading the title. I've used 9 main categories to be used, and for each news the users will decide from 0-5 how much this particular news belongs to each category.
Users will also search for news by title and decide if this news is discussing the same event as another news, or if this news is similar to another news (like a news about a football game at 3 PM and a news about a football game at 5 PM).
I have no problems with making the application itself, I just need some help with the part of how to design one/many tables that can link news that talk about similar events or about the same event, since there can be many many news that can talk about the same event.
So far I've done something like this:
Table News
ID
Title
URL
PublicationDate
NewsContent
Table Category
NewsID
User_ID
Economy
Politics
Present_Day
Sport
Technology
Showbiz
Culture
Region
World
Table User
ID
FirstName
LastName
Each category field in the Category table holds a tinyint data type (I'm using SQL Server) and I've restricted the values the user can input via a check constraint so that it allows values from 0 - 5. I don't know if this is the right approach so far for the database design, and I need to add the table/s that allow to add information about similar news or news that talk about the same event, such as a table Similar_News containing some fields like News_ID, SimilarNews1_ID, SimilarNews2_ID and so on, and must contain the information about which user 'said this', but this sounds like a flawed design to me.
Any help is appreciated, thank you.

Here are some suggestions. In the CATEGORY table you have created 9 different columns (economy, politics etc). What if a few days / months / years down the line there is a new category. In this case you will have to modify your database design. Instead, you could have the following structure of your CATEGORY table.
CategoryId
Category
And have one more table to store the actual news categorization.
Table: NewsCategory
NewsId
CategoryId
CategoryWeight (This will store the rating from 1-5)
If the user feels that a news does not belong to a particular category then no row will be inserted in this table for that category. Such a structure will give you more flexibility to insert new categories in future without changing the database design. You just have to insert new rows in the Category table.
For storing similar news I would recommend one of the following approach.
Create a table event and store the details of the event in this table.
EventId
EventDescription
Sample Data
EventId: 55
EventDescription: Euro 2016 Belgium vs Italy
Now you can include this EventId in your News table. This way you can pull up all the news that are related to this event.

Create Similar_News table with n fields to store the similar News_Id doesn't sound a good idea. How many fields would you create? 3? 10?
You are modeling a n to m relationship so I would only use two fields (three if you wanna store the User Id). For example if New 1 is similar to news 2 and 3. And New 3 is similar to 4, you insert the rows:
New_ID SImilar_New_Id
--------------------------
1 2
1 3
3 4
Another approach would be use a NOSQL DB to store flexible structures, for example:
"News":{
"User_Id":1,
"Category"{
"Economy":3,
"Politics":4
},
"Similar_News":[1,2,3]
}

Related

Extend existing table or design a new one?

I have a database with a table which named Category, different category contains different articles, but as time goes by, only Category is not enough for distinguish articles on website, so... to meets our demands, we prepare to create new tables.
our category table like this:
Category
-------------
id
name
description
image
we want to create new table(more than one) like this:
Topic
--------------
id
name
description
image
icon(.svg)
display(boolean)
new tables are just like Category table, but add more one or two column.
In this situation, what choice is better for Content Management and server/AWS-RDS query efficacy? (or can't get both?)
create new table
add a column which like Class(class1 = category, class2 = topic, class3 =...) to redefine existing content?
or other suggestions?
It makes more sense to create a new table because these are 2 separate concepts in your domain logic. Also, it is better for query performance and you get more flexibility because, if you need to, you can easily add columns to each table independently.

Database design - Managing documents

I asked a more generic question yesterday and done some work based on this. This question is more to the point in regards to something I am trying to work out.
So my application has a departments table to handle departments. So I am able to make different departments within my application such as Marketing and Finance.
The problem is, I know what departments I need to make, and these will be created beforehand (but I have made it like a CMS so an admin can edit departments etc). With the departments created, I envision something like this
So a user can choose the department from a dropdown (remember, this is after departments are created). When they do this, the document dropdown should populate.
This is my problem, how can I associate specific documents to a department? Each document requires different inputs, so I would imagine they need to be different tables? At the moment I have
But this doesnt really solve my problem whereby I can state that the Marketing department has a Brief document and Overview document.
How could I go about doing this seeing that I do not specifically have a table for each department? Would I need to create one for each department?
Thanks
You can do the following:
If the variety of the document's information you need to store is to large
create a table which has 5 columns:
id,department_id,crated_at, updated_at, property
so for each property you will have a record in the table e.g:
id department_id crated_at updated_at property
1 454 2015-08-20 2015-08-22 x:34
2 454 2015-08-26 2015-08-26 z:234
3 934 2015-08-25 2015-08-26 y:45
This way you won't need table for each document type
EDIT: another option is adding one column for property name and one for it's value
id,department_id,crated_at, updated_at, property_name, property_value

What is right table design and proper searching queries for my database?

I try to design mysql tables for following purpose. The site is about any topics and items that belongs to one or several topics - for example lets take topic "Batman". For this, there are items belonging to several categories - movies, books, video games and cars(Batmobil). Number of categories is 10-20. My idea is to make these tables:
Topic
-------
id
name
Item
-------
id
name
category_id
decscription and some other common columns
Movie (and other categories, like Car, VideoGame, Book,...)
-------
id
item_id
specific columns for each category
Item_Topic (and other tables like Item_Category, Item_Movie, Item_Car, Item_Videogame, Item_Person...)
-------
item_id
topic_id (movie_id, car_id, person_id, ...)
Person
-------
item_id
name
role
So far, I think this is good solution (or am I wrong?)
But my two problems are following:
Movie's actors and book's authors in my table design aren't considered as items. For actors and writers there is a table Person and Item-Person table for connecting items with persons. But what if a topic would be "Stephen King"? How should I search for Stephen King's books and movies that he wrote scenaries for? Only approach that came on my mind is changing table design and consider persons as items too. Is it good idea? What is table design solution of this problem? And could you please advise me, how should general query for matching topic and items using proper table design look like?
I would like to show all items with detail informations (stored in Movie, Book, VideoGame,...) belonging to one topic on one page - what is the best query I can use?
Thank you very much for any answers!
Let me comment on this part of your design:
Movie (and other categories, like Car, VideoGame, Book,...)
-------
item_id
specific columns for each category
This is usually not the best approach when using the relational model. If possible, it's better to have a Category table with records for Movie, Car, VideoGame, Book, etc. That way it's easier to add new categories as neeeded.
If the item data varies widely based on category, there are techniques to handle that, but it's not pretty (commonly a table of item attributes and then your item table is a mapping table of attributes to values).
Update
Thanks to #RichardCZ for reminding me the attribute mapping approach is called EAV, Entity–attribute–value, model.

access: how to add a workorder to a customer

I have a noob question but he, i'm learning :-)
I'm making a form with the following tables 1 tblCustomers and 1 tblWorkorders.
My question is:
When I add a customer to a new record, this person is stored in the table: tblCustomers this is going fine.
The problem is that I also have a table: tblWorkorders, in this table I store all the technical information, sollutions and the customers belongings. (adapter, notebook bag etc etc)
My problem excists when for example a customer named John Doe comes back with another problem 2 weeks later. In the table tblWorkorders should be 2 records with the problems of John Doe I think it has something to do with relationships between the tables, can someone tell me where to find a good example or when it's a short story, how to do this?
Very difficult to explain this concept and start you off from scratch. Be prepared for further research on different item. Here is a place to start: http://office.microsoft.com/en-us/access-help/guide-to-table-relationships-HA010120534.aspx
The following is how you would use your tables:
You need to have a common field in both tables (it can be more than one field, but let's keep it simple). The easy way is to have a CustomerID field that is a Data Type field set to: AutoNumber (It does just what it says.).
tblWorkOrders will have the same field (doesn't have to be the same name, but let's keep it simple) BUT, the Data Type is: Number Field Size: Long Interger.
If you're able to use: Database Tools | Relationships, and join the two tables by this field, developing forms and reports is a lot easier.
Your form will be based on the tblCustomers table (I know, let's keep it simple.) and a Sub Form will use the tblWorkorders table and the 'Link Master Fields' and the 'Link Child Fields' will use the CustomerID from each table.

Problems while designing a database to manage all kind of products like Amazon

first of all sorry for my bad english hehehe I need some help, I want to design a database for a website, like a mini Amazon. This database will manage every kind of products (TV, cars, computers, books, videogames, penciles, tables, pants...), but also, each product must have some properties (that will be indexed) for example, if the product is a book, the properties will be something like genre, year, author. If the product is a TV, the properties will be something like size, color, also year. And if the product is a car, the properties will be something like year, color, model, for example. So, this is my idea:
One table to manage departments (like electronics, books...)
One table to manage categories of the departments, this table will be a child of the previous. If the department is electronics, here will be audio, tv and video, games... (each category belongs to one department, the relationship is one department to many categories)
One table to manage the products (each product belongs to one category, the relationship is one category to many products)
One table to manage properties (like year, color, genre, model...)
One table to engage products with properties, this table will be called ProductProperties
Im not sure if this is the best way, the database will be huge, I will develop the database on MySQL. But, I think this is not the best way, this article talks about "Database Abstraction: Aggregation and Generalization" http://cs-exhibitions.uni-klu.ac.at/index.php?id=433, in other words generic objects (I think), but this way is old (70s). In this article http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/ in the section "One table to hold all domain values" says that this is a wrong way... Im saying all of this because of the table ProductProperties, I dont know if I make this table or if I make especific tables for each kind of products.
Do you have any suggestion? Or do you have a better idea?
Thanks in advance, take care!!!
1.One table to manage departments (like electronics, books...)
2.One table to manage categories of the departments, this table will be a
child of the previous. If the
department is electronics, here will
be audio, tv and video, games... (each
category belongs to one department,
the relationship is one department to
many categories)
Why? One table, categories, forming a hierarchy. More flexible.
3.One table to manage the products (each product belongs to one category,
the relationship is one category to
many products)
Why? Allow m:n here. A product in many categorries.
Im not sure if this is the best way,
the database will be huge
Ah - no. Sorry. Nontrivial, yes. Hugh? No. Just to get you an idea of hugh - I have a db I am adding 1.2 billion rows PER DAY to a specific table. On average. THIS is big. YOu end up with what - 100.000 items? not even worth mentioning.
Pablo89, the description of what you want is very close to what the AdventureWorks database for SQL Server does. There are many examples of using AdventureWorks on the Web from web applicatons to reporting to BI.
Download and install SQL Server Express 2008 R2. Download and install the sample database for the above product. Inspect the database design for AdventureWorks.
Use AdventureWorks as examples in questions you may post.
I use AdventureWorks because I use SQL Server. I do not say it is better than other database products I say this because I know AdventureWorks.
I do not think that some database can work fast with 500,000,000 items. Complete tree of products categories for amazon.com contains 51,000 nodes (amazoncategories.info). Also the data is updated hourly, so saved product information can be incorrect. I think the optimal way is to store categories tree only get the product data at runtime using Amazon's API.

Resources