Ways to store different kinds of messages in a DB - database

I am wondering what are the possibilities for storing this kind of data in an efficient way.
Lets say I have 100 kinds of different messages that I need to store. all messages has a common data like message Id, message name, sender, receiver, insert date etc. every kind of message has it's own unique columns that we wanna store and index (for quick queries).
We don't want to use 100 different tables, it will be impossible to work with.
The best way That I could come up with is to use 2-3 tables:
1. for the common data.
2. for the extra unique data when every column has a generic name like: foreign key, column1, column2....column20. there will be index on every column + the foreign key(20 indexes for 20 columns).
3. optional, metadata table to describe the generic columns for every unique message.
UPDATE:
lets say that I am a backbone for passing data, there are 100 different kinds of data(messages). I want to store every message that comes through me, but not as a bulk data because later I would like to query the data based on the unique columns of every different message type.
Is there a better way out there that I don't know about?
Thanks.
BTW the database is oracle.
UPDATE 2:
Does NoSQL database can give me a better solution?

The approach of 2 or 3 tables that you suggest seems reasonable for me.
An alternative way would be to just store all those unique columns in the same table along with common data. That should not prevent you from creating indexes for them.

Here is your answer:
You can store as much messages as you want against each message type
UPDATE:
This is what you exactly want. There are different messages type as you stated e.g. tracking, cargo or whatever. These categories will be stored in msg_type table. You can store as much categories in msg_type as you want.
Then say each msg_type has numerous messages that will be stored in Messages table. You can store here as much messages as you want without any limit
Here is your database SQL:
create table msg_type(
type_id number(14) primary key,
type varchar2(50)
);
create sequence msg_type_seq
start with 1 increment by 1;
create or replace trigger msg_type_trig
before insert on msg_type
referencing new as new
for each row
begin
select msg_type_seq.nextval into :new.type_id from dual;
end;
/
create table Messages(
msg_id number(14) primary key,
type_id number(14) constraint Messages_fk references CvCategories(type_id),
msg_date timestamp(0) default sysdate,
msg varchar2(3900));
create sequence Messages_seq
start with 1 increment by 1;
create or replace trigger Messages_trig
before insert on Messages
referencing new as new
for each row
begin
select Messages_seq.nextval into :new.msg_id from dual;
end;
/

Related

Avoid having over 1500 columns in Postgres table

The situation is:
I have many different seeds.
Each seed passed through a black box will spit out values for over 1500 unique categories. Passing the same seed will result in the same values for the same respective categories.
I have trouble creating a relation of tables without having a table with over 1500 columns, one per category.
This is more of a math-y problem but I don't know where else to post the problem.
I would suggest having the category itself as a column.
You could then add a unique constraint on that category (or a tuple that uniquely identified a row for your purposes).
You can then handle exceptions based on that constraint violation in your driver script (or similar) to continue on (e.g. in that case, it sounds like it should be a no-op, since that row already exists).
As an optimization, you could (and I would say should) add an exists check to the query to see if the entry already exists such that, in the case it does, it would be a no-op.
Based on the info you have provided, I don't see a need to have 1500 columns.
If I understand you correctly, Postgres for your problem already have solution
How to avoid having over 1500 columns in Postgres table is use column of type json
For example - creating table with json column
CREATE TABLE seeds (
ID serial NOT NULL PRIMARY KEY,
info json NOT NULL
);
How to insert data to there
INSERT INTO seeds (info)
VALUES (
'{ "seed": "Seed1", "items": {"category": "1","qty": 6}}'
);
Also you can make query in these columns like there:
SELECT * FROM seeds WHERE info #> '{"category":"1"}';
Perhaps this will help you

Suggestions for chat system schema design

I need a suggestion about sql table schema. I've opened a table and named it Chats, would it be better for me to add two columns(like ID and Messages) or one that will contain the IDs and the messages? And which one of them will work faster
Personally I'd model this as two tables:
Chats
- ID
- Name
Messages
- ID
- ChatID
- Message
- SentDate
There should be a foreign key from Messages.ChatID to Chats.ID.
Otherwise you're going to have to create duplicate chats each time someone sends a message.
I would strongly recommend against keeping IDs and Values in the same column, it makes it near impossible to join on and will create all sorts of problems later on.
There is no reason to use a single column. Add as many columns as possible, each with its own data type because you will be able to filter and sort the table by each column later. You will also be able to add constraints, indexes, statistics, etc... if needed.
Any query performed on that table will work faster if you use separate columns.

What is the best way to keep changes history to database fields?

For example I have a table which stores details about properties. Which could have owners, value etc.
Is there a good design to keep the history of every change to owner and value. I want to do this for many tables. Kind of like an audit of the table.
What I thought was keeping a single table with fields
table_name, field_name, prev_value, current_val, time, user.
But it looks kind of hacky and ugly. Is there a better design?
Thanks.
There are a few approaches
Field based
audit_field (table_name, id, field_name, field_value, datetime)
This one can capture the history of all tables and is easy to extend to new tables. No changes to structure is necessary for new tables.
Field_value is sometimes split into multiple fields to natively support the actual field type from the original table (but only one of those fields will be filled, so the data is denormalized; a variant is to split the above table into one table for each type).
Other meta data such as field_type, user_id, user_ip, action (update, delete, insert) etc.. can be useful.
The structure of such records will most likely need to be transformed to be used.
Record based
audit_table_name (timestamp, id, field_1, field_2, ..., field_n)
For each record type in the database create a generalized table that has all the fields as the original record, plus a versioning field (additional meta data again possible). One table for each working table is necessary. The process of creating such tables can be automated.
This approach provides you with semantically rich structure very similar to the main data structure so the tools used to analyze and process the original data can be easily used on this structure, too.
Log file
The first two approaches usually use tables which are very lightly indexed (or no indexes at all and no referential integrity) so that the write penalty is minimized. Still, sometimes flat log file might be preferred, but of course functionally is greatly reduced. (Basically depends if you want an actual audit/log that will be analyzed by some other system or the historical records are the part of the main system).
A different way to look at this is to time-dimension the data.
Assuming your table looks like this:
create table my_table (
my_table_id number not null primary key,
attr1 varchar2(10) not null,
attr2 number null,
constraint my_table_ak unique (attr1, att2) );
Then if you changed it like so:
create table my_table (
my_table_id number not null,
attr1 varchar2(10) not null,
attr2 number null,
effective_date date not null,
is_deleted number(1,0) not null default 0,
constraint my_table_ak unique (attr1, att2, effective_date)
constraint my_table_pk primary key (my_table_id, effective_date) );
You'd be able to have a complete running history of my_table, online and available. You'd have to change the paradigm of the programs (or use database triggers) to intercept UPDATE activity into INSERT activity, and to change DELETE activity into UPDATing the IS_DELETED boolean.
Unreason:
You are correct that this solution similar to record-based auditing; I read it initially as a concatenation of fields into a string, which I've also seen. My apologies.
The primary differences I see between the time-dimensioning the table and using record based auditing center around maintainability without sacrificing performance or scalability.
Maintainability: One needs to remember to change the shadow table if making a structural change to the primary table. Similarly, one needs to remember to make changes to the triggers which perform change-tracking, as such logic cannot live in the app. If one uses a view to simplify access to the tables, you've also got to update it, and change the instead-of trigger which would be against it to intercept DML.
In a time-dimensioned table, you make the strucutural change you need to, and you're done. As someone who's been the FNG on a legacy project, such clarity is appreciated, especially if you have to do a lot of refactoring.
Performance and Scalability: If one partitions the time-dimensioned table on the effective/expiry date column, the active records are in one "table", and the inactive records are in another. Exactly how is that less scalable than your solution? "Deleting" and active record involves row movement in Oracle, which is a delete-and-insert under the covers - exactly what the record-based solution would require.
The flip side of performance is that if the application is querying for a record as of some date, partition elimination allows the database to search only the table/index where the record could be; a view-based solution to search active and inactive records would require a UNION-ALL, and not using such a view requires putting the UNION-ALL in everywhere, or using some sort of "look-here, then look-there" logic in the app, to which I say: blech.
In short, it's a design choice; I'm not sure either's right or either's wrong.
In our projects we usually do it this way:
You have a table
properties(ID, value1, value2)
then you add table
properties_audit(ID, RecordID, timestamp or datetime, value1, value2)
ID -is an id of history record(not really required)
RecordID -points to the record in original properties table.
when you update properties table you add new record to properties_audit with previous values of record updated in properties. This can be done using triggers or in your DAL.
After that you have latest value in properties and all the history(previous values) in properties_audit.
I think a simpler schema would be
table_name, field_name, value, time, userId
No need to save current and previous values in the audit tables. When you make a change to any of the fields you just have to add a row in the audit table with the changed value. This way you can always sort the audit table on time and know what was the previous value in the field prior to your change.

trigger insertions into same table

I have many tables in my database which are interrelated. I have a table (table one) which has had data inserted and the id auto increments. Once that row has an ID i want to insert this into a table (table three) with another set of ID's which comes from a form(this data will also be going into a table, so it could from from that table), the same form as the data which went into the first table came from.
The two ID's together make the primary key of the third table.
How can I do this, its to show that more than one ID is joined to a single ID for something else.
Thanks.
You can't do that through a trigger as the trigger only has available to it the data that you already inserted not data that is currenlty only residing in your user interface.
Normally how you handle this situation is that you write a stored proc that inserts the meeting, returns the id value (using scope_identity() in SQL Server, but I'm sure other databases would have method to return the auto-generated id as well). Then you would use that value to insert to the other table with the other values you need for that table. You would of course want to wrap the whole thing in a transaction.
I think you can probably do what you're describing (just write the INSERTs to table 3) in the table 1 trigger) but you'll have to put the additional info for the table 3 rows into your table 1 row, which isn't very smart.
I can't see why you would do that instead of writing the INSERTs in your code, where someone reading it can see what's happening.
The trouble with triggers is that they make it easy to hide business logic in the database. I think (and I believe I'm in the majority here) that it's easier to understand, manage, maintain and generally all-round deal with an application where all the business rules exist in the same general area.
There are reasons to use triggers (for propagating denormalised values, for example) just as there are reasons for useing stored procedures. I'm going to assert that they are largely related to performance-critical areas. Or should be.

Fixed-size array database field

I need to store several date values in a database field. These values will be tied to a "User" such that each user will have their own unique set of these several date values.
I could use a one-to-many relationship here but each user will have exactly 4 date values tied to them so I feel that a one-to-many table would be overkill (in many ways e.g. speed) but if I needed to query against them I would need those 4 values to be in different fields e.g. MyDate1 MyDate2 ... etc. but then the SQL command to fetch it out would have to check for 4 values each time.
So the one-to-many relationship would probably be the best solution, but is there a better/cleaner/faster/whatever another way around? Am I designing it correctly?
The platform is MS SQL 2005 but solution on any platform will do, I'm mostly looking for proper db designing techniques.
EDIT: The 4 fields represent 4 instances of the same thing.
If you do it as four separate fields, then you don't have to join. To Save the query syntax from being too horrible, you could write:
SELECT * FROM MyTable WHERE 'DateLiteral' IN (MyDate1, MyDate2, MyDate3, MyDate4);
As mentioned in comments, the IN operator is pretty specific when it comes to date fields (down to the last (milli)second). You can always use date time functions on the subquery, but BETWEEN is unusable:
SELECT * FROM MyTable WHERE date_trunc('hour', 'DateLiteral')
IN (date_trunc('hour', MyDate1), date_trunc('hour', MyDate2), date_trunc('hour', MyDate3), date_trunc('hour', MyDate4));
Some databases like Firebird have array datatype, which does exactly what you described. It is declared something like this:
alter table t1 add MyDate[4] date;
For what it's worth, the normalized design would be to store the dates as rows in a dependent table.
Storing multiple values in a single column is not a normalized design; normalization explicitly means each column has exactly one value.
You can make sure no more than four rows are inserted into the dependent table this way:
CREATE TABLE ThisManyDates (n INT PRIMARY KEY);
INSERT INTO ThisManyDates VALUES (1), (2), (3), (4);
CREATE TABLE UserDates (
User_ID INT REFERENCES Users,
n INT REFERENCES ThisManyDates,
Date_Value DATE NOT NULL,
PRIMARY KEY (User_ID, n)
);
However, this design doesn't allow you make the date values mandatory.
How about having 4 fields alongwith User ID (if you are sure, it wont exceed that)?
Create four date fields and store the dates in the fields. The date fields might be part of your user table, or they might be in some other table joined to the user table in a one-to-one relationship. It's your call.

Resources