Avoid having over 1500 columns in Postgres table - database

The situation is:
I have many different seeds.
Each seed passed through a black box will spit out values for over 1500 unique categories. Passing the same seed will result in the same values for the same respective categories.
I have trouble creating a relation of tables without having a table with over 1500 columns, one per category.
This is more of a math-y problem but I don't know where else to post the problem.

I would suggest having the category itself as a column.
You could then add a unique constraint on that category (or a tuple that uniquely identified a row for your purposes).
You can then handle exceptions based on that constraint violation in your driver script (or similar) to continue on (e.g. in that case, it sounds like it should be a no-op, since that row already exists).
As an optimization, you could (and I would say should) add an exists check to the query to see if the entry already exists such that, in the case it does, it would be a no-op.
Based on the info you have provided, I don't see a need to have 1500 columns.

If I understand you correctly, Postgres for your problem already have solution
How to avoid having over 1500 columns in Postgres table is use column of type json
For example - creating table with json column
CREATE TABLE seeds (
ID serial NOT NULL PRIMARY KEY,
info json NOT NULL
);
How to insert data to there
INSERT INTO seeds (info)
VALUES (
'{ "seed": "Seed1", "items": {"category": "1","qty": 6}}'
);
Also you can make query in these columns like there:
SELECT * FROM seeds WHERE info #> '{"category":"1"}';
Perhaps this will help you

Related

Ways to store different kinds of messages in a DB

I am wondering what are the possibilities for storing this kind of data in an efficient way.
Lets say I have 100 kinds of different messages that I need to store. all messages has a common data like message Id, message name, sender, receiver, insert date etc. every kind of message has it's own unique columns that we wanna store and index (for quick queries).
We don't want to use 100 different tables, it will be impossible to work with.
The best way That I could come up with is to use 2-3 tables:
1. for the common data.
2. for the extra unique data when every column has a generic name like: foreign key, column1, column2....column20. there will be index on every column + the foreign key(20 indexes for 20 columns).
3. optional, metadata table to describe the generic columns for every unique message.
UPDATE:
lets say that I am a backbone for passing data, there are 100 different kinds of data(messages). I want to store every message that comes through me, but not as a bulk data because later I would like to query the data based on the unique columns of every different message type.
Is there a better way out there that I don't know about?
Thanks.
BTW the database is oracle.
UPDATE 2:
Does NoSQL database can give me a better solution?
The approach of 2 or 3 tables that you suggest seems reasonable for me.
An alternative way would be to just store all those unique columns in the same table along with common data. That should not prevent you from creating indexes for them.
Here is your answer:
You can store as much messages as you want against each message type
UPDATE:
This is what you exactly want. There are different messages type as you stated e.g. tracking, cargo or whatever. These categories will be stored in msg_type table. You can store as much categories in msg_type as you want.
Then say each msg_type has numerous messages that will be stored in Messages table. You can store here as much messages as you want without any limit
Here is your database SQL:
create table msg_type(
type_id number(14) primary key,
type varchar2(50)
);
create sequence msg_type_seq
start with 1 increment by 1;
create or replace trigger msg_type_trig
before insert on msg_type
referencing new as new
for each row
begin
select msg_type_seq.nextval into :new.type_id from dual;
end;
/
create table Messages(
msg_id number(14) primary key,
type_id number(14) constraint Messages_fk references CvCategories(type_id),
msg_date timestamp(0) default sysdate,
msg varchar2(3900));
create sequence Messages_seq
start with 1 increment by 1;
create or replace trigger Messages_trig
before insert on Messages
referencing new as new
for each row
begin
select Messages_seq.nextval into :new.msg_id from dual;
end;
/

How to store dashboard definitions in oracle database

I am creating a system for displaying dashboards. A dashboard is made up of a number of dashlets of different types e.g. a trend, a histogram, a spiderplot etc.
Currently I define each dashlet as a row within an oracle table. My dilemma is as follows. Each type of dashlet has a different set of parameters such as line color, y axis title, maximum y etc. Up to now I have been creating a different column for each parameter but this means that I have a very large number of columns and many of the columns are not relevant for a particular dashlet and are left empty. I have now tried using a single column called definitions which contains information defining the characteristics of the dashlet. Example below.
ytitle: Count|
linecolor: Yellow|
linethickness: 12|
.....
The problem with this is that if you mis-spell an item the program will fail at runtime.
What is the best way to tackle this problem.
You can create table, let's say t_parameters, where parameter name(ytitle,linecolor) will be primary or unique key. Then you can create a foreign key on your parameter_name column which is in your definition table (the one storing assingment: ytitle Count,etc...)
Now if you want to ensure that also parameter value is from exact list, you can do the same by creating table of parameter values and creating unique key and then foreign key in definition table.
Then if you need it to be more advanced and check which parameter can be of which values you can create lookup table having columns parameter_name,parameter_value like:
linecolor; yellow
linecolor; red
Ytitle; sum
Ytitle; count
This is one way how to ensure reference integrity.
Best practice would be to set in t_parameter for parameter_name an numeric id and made this to be PK and reference these in lookup tables.

How to store row modification data in Postgres?

So I need to find a way to track modifications made to database rows. I'm thinking the best way to do this is to store object deltas for the row after the modification has been made to the row. So if I had this as a table:
CREATE TABLE global.users
(
id serial NOT NULL,
username text,
password text,
email text,
admin boolean, --
CONSTRAINT users_pkey PRIMARY KEY (id)
) WITH (
OIDS=FALSE
);
ALTER TABLE global.users
OWNER TO postgres;
COMMENT ON COLUMN global.users.admin IS '';
So say I updated the username field from "support" to "admin", then I'd save a JSON object like, say:
{
"username": ["support", "admin"]
}
associated with a specific row ID.
So my question is as follows: what's a nice way to organize these objects in Postgres? I'm currently debating between either a) having a deltas table for every existing table in the database (so this object would go into a table called global.users_delta or similar) or b) having a global deltas table which holds all deltas for all objects and tracks which table each is associated with.
I haven't really been able to identify any best practices for doing this sort of thing as yet, so even some direction towards preexisting documentation would be nice.
EDIT: An added requirement here is the issue of how to deal with the related data. So say, something belongs to a category, which is stored in another table. Usually that would be referenced by ID, so the delta would track the change in category ID's numeric value. That value needs to be labeled somehow, but that label can't necessarily be retroactively applied (in case the other linked value changes say). Should the labeled value be stored or should the raw value be stored? Or maybe both?

How to delete a row with primary key id?

I have a SQL table user. It has 3 columns id(set as primary and Is Identity as Yes), name and password. When I enter data to the table the id became incremented. But on delete query only name and password will be deleted. I need to delete a particular row including the id.
For example:
id:1 name:abc password:123
id:2 name:wer password:234
id:3 name:lkj password:222
id:4 name:new password:999
I need to delete the third column ie, id:3 name:lkj password:222 . But after deleting this row, the table should be shown as below.
id:1 name:abc password:123
id:2 name:wer password:234
id:3 name:new password:999
From the additional information you have provided it shows you do not understand the IDENTITY data type. As others, including myself, have said numbers are not re-used.
You should also avoid changing primary keys just because a row was deleted.
It would seem you need a row number, don't use the key for this. Create a view using the ROW_NUMBER function, something like
SELECT ROW_NUMBER() OVER (Order by id) AS row_number, name, password, ...
FROM [Your_Table]
As #Tony said, once a number has been used, it isn't available anymore. A workaround for this problem is the following:
1. Don't use an Identity field at all. Use just an integer field set as primary key.
2. Declare a trigger which is triggered whenever a new row is inserted.
3. This trigger has to read the the ID of the last inserted row in the table and increment it by one and insert the result in the ID field.
So when you delete this row later, the ID is available again.
If you want to reuse the id later, that is an extremely poor idea. Don;t go down that path. The only ways to do that are either performance problems or are very subject to error when you have race conditions. There is a reason why udntities don't reuse values after all. The id should be meaningless anyway. There is usually no reason why it can't skip values except personal preference. But personal preference should not take precendence over performance and reliability. If you want to dothis because you hate the skipped values then don't. If you are getting this requirement from above, then push back. Tell them that the alternative are more time-consuming and less reliable and far more likely to cause data integrity problems.

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.

Resources