PostgreSQL: Merge records within a table - database

I have one large table which has the following structure:
As you can see in the image above, the primary key consists of the intervalstart and the edgeid and therefore, there are no duplicate entries within this table.
What I want to do now is to update some edgeids as some of them are deprecated.
For example, I want to update ALL records which have the edgeid "E304178540From" with the new edgeid "E304178582From". As you can see in the image above, this will fail because I would have created a duplicate (but with different values for avgvelocity, measurementcount and vehiclecount).
So as a solution, I want to "merge" those records (in this example the first two entries in the image above) and calculate new values for the avgvelocity, measurementcount and vehiclecount (by calculating the average).
So that it looks like this:
intervalstart | day | edgeid | avgvelocity | measurementcount | vehiclecount
2014-01-01 00:00:00 | 3 | E304178582From | 85 | 1 | 120
Any suggestions on how to solve this?
Thank you for any help you can give me!

One option would be to use ON Conflcit clause of insert command. The syntax is as follows:
INSERT INTO table_name(column_list) VALUES(value_list)
ON CONFLICT target action;
So, Try inserting a new record which will lead to a conflict (based on the timestamp and edgeid), calculate the new avg etc in the on conflict clause. You can refer to the original values using EXCLUDED pseudo table.
Please refer to the documentation here for more information:
https://www.postgresql.org/docs/9.5/static/sql-insert.html

Related

Creating a Postgres sequence for each foreign key as a default parameter?

I am trying to build a journal that keeps track of accounts. It's append-only, and each account should have its own sequence. For example:
sequence_nbr | account_id
1 | act_1
2 | act_1
1 | act_2
1 | act_3
2 | act 2
3 | act_1
I'd like sequence_nbr to be a permanent column in my journal table, and I'd like it to be automatically incremented, that is, when I do an insert I don't have to specify a value for it, Postgres automatically computes its correct sequence number for me.
I have tried two different ways:
Creating a sequence, but I couldn't get it to depend on the value of account_id
Creating a function as in Postgres Dynamically Create Sequences, but I can't figure out how to pass the argument to the function to create a default on the column definition for the journal table.
Is there a way to accomplish what I want in Postgres?

Table design for subtables

I have a table Event which contains ideventtype (references event type), date, idhost,etc.
Now i may have various event type and based on the event type i have fields related to the event type. How do i store these details
Solution 1:-
Save it in the same table as event by adding another column.
Solution 2:-
Create another table, say event_birthday (birthday is the event type) and there i can store the "birthday" related data. However if i have other event types i will have to create many tables, eventually making event type useless.
Any suggestions or other ways to proceed.
[Constraint Added]
Once constraint I have is that the subtable needs to have a foreign key to another table. For example i have a hospital table to be linked to location in the sub table.
I would consider using a NoSQL database, such as MongoDB, for this purpose. But if you're stuck with Sql Server, one option would be to save the event details in a generic separate table as xml and use serialization in code to process the values. This way you could still use queries for data filtering.
Two more other ways also possible.
Solution 3:
Create another table EventAddInfo and that should have 4 columns.
EventAddInfoID
EventId
EventSubCol
EventSubValue
Here we can store these Extra columns as rows. And EventId will be foreign Key from Event table.
Example data would be
| EventAddInfoID |EventId | EventSubCol | EventSubValue |
|---------------- |----------|--------------|---------------|
| 10 | 4 | birthdate | 2002-05-30 |
| 11 | 4 | birthplace | Hospital |
| 12 | 5 | Score | 85 |
| 13 | 5 | Study | Degree |
Note: Consider EventId 5 related to Study Event
Solution 4:
Having one extra column in Event Table of type XML.
For example EventSubInfo.
In this column we can store xml of our own structure
<Cols>
<birthdate>2002-05-30T09:00:00<birthdate>
<birthplace>Hospital<birthplace>
</Cols>

Read data from two CSC file with using Join to store a table by using SSIS

I am new to SSIS. (I am a learning stage but had some task). I have two CSV file. Both files have 3 columns. One file has FYON, Family, Number.
Another file has Family, Number, Description.
The both Family and Number columns are relational columns for both files.
I want to read the values from those files and need to store the data in SQL server table as below columns
+----+------+--------+--------+-------------+
| ID | Fyon | Family | Number | Description |
+----+------+--------+--------+-------------+
| 1 | 50 | AP | 01 | SV32 |
+----+------+--------+--------+-------------+
Also I want to store the error in error table if the data is null or duplicate
I don't know how can I achieve this.
A simple merge transformation in SSIS would do this for you. You need to read data from both files, sort them and then MERGE them (just number field should be sufficient) and then push all the required columns fom both the sources to the SQL table (OleDB Destination/SQL Server Destination).
Look at the second example here: http://www.learnmsbitutorials.net/ssis-merge-and-mergejoin-example.php
I have done it by myself with help of #deepak answer. finally the flow is

How to merge two Excel sheets

I have an Excel document with 10000 rows of data in two sheets, the thing is one of these sheets have the product costs, and the other has category and other information. These two are imported automatically from the sql server so I don't want to move it to Access but still I want to link the product codes so that when I merge the product tables as product name and cost on the same table, I can be sure that I'm getting the right information.
For example:
Code | name | category
------------------------------
1 | mouse | OEM
4 | keyboard | OEM
2 | monitor | screen
Code | cost |
------------------------------
1 | 123 |
4 | 1234 |
2 | 1232 |
7 | 587 |
Let's say my two sheets have tables like these, as you can see the next one has one that doesn't exist on the other- I put it there because in reality one has a few more, preventing a perfect match. Therefore I couldn't just sort both tables to A-Z and get the costs that way- as I said there are more than 10000 products in that database and I wouldn't want to risk a slight shift of costs -with those extra entries on the other table- that would ruin the whole table.
So what would be a good solution to get the entry from another sheet and inserting it to the right row when merging? Linking two tables with field name??... checking field and trying to match it with the other sheet??... Anything at all.
Note: When I use Access I would make relationships and when I would run a query it would match them automatically... I was wondering if there's a way to do that in excel too.
Why not use a vlookup? If there is a match, it will list the cost. Assuming the top is sheet1 and the other sheet2 and they both start on cell A1. You just need this in cell D2.
=VLOOKUP(A2,Sheet2!A:B,2,0)
You can then drag it down. Easiest way to fill all your 10000 rows is to hover over the bottom left corner of the cell with your cursor. It will turn from a white plus sign into a thin black one. Then simply double click.
Just use VLOOKUP - you can add a row to your first sheet, and find the cost based on code in the other sheet.

How to Insert row between two rows and give it priority in database?

i have a stack of messages in database table.
i want to send these messages by their priority so i added "Priority" column to "messages" table.
but what if i want to insert "cram" message between two messages and give the previous priority to this new message?.
Should i update all messages priority under this message.
So please give me the perfect design for my database table to support priority update.
Use a float column for Priority rather than an int.
Then, to insert a message between two others, assign the average of the two messages' Priority values as the new message's Priority. (E.g., to insert a cram message between a messages with Priority 2 and 3, assign it a Priority of 2.5).
By doing this, you don't have to update any other messages' priority, and you can continue to average/insert between those, etc. until you bump up against the decimal accuracy limits of a float (which will take awhile, especially if the raw Priority values tend to be small).
Or, add another column after Priority in the ORDER BY. In the simplest case, use bit column called "ShowAfter" with a default value of 0. When you insert a cram message, give it the same Priority as the message you want to see it after, but a [ShowAfter] value of 1.
Just wild idea, haven't test this for performance, but link list kind of structure should net you want you want here. At maximum you will only need to change 3 records
Find out where you want to put your new record,
note what record comes before it and what record comes after it.
new record, establish previous record and the next record.
relink the previous and next records accordingly to the new record.
You do this by adding 2 fields (next and previous) in the schema.
Just include a timestamp column with the default getdate() value. This way, when sending messages, order by priority asc, createtime desc.
If you don't always want to do Last-In-First-Out (LIFO), you can do order by priority, senddate and then set senddate to 1/1/1900 for anything you want pushed out first.
If you want to do it by some method of ranking them, you'd have to update every single row below a given priority if you wanted to "cram" a message in. With a getdate() default column, you just don't have to worry about that.
Interesting, maybe you could use an identity column as the primary key but use an increment that skips a few values?
This way, you would reserve space should you need to insert/update a messages priority to be between an existing boundary.
Make sense?
Similar to #Jimmy Chandra's idea, but use a single link column.
So, you might have these columns:
ID | SortAfterID | OtherColumn1 | OtherColumn2
Now, let's say you have five records with IDs 1 through 5, and you want record 5 to be sorted between 2 and 3. Your table would look something like this:
ID | SortAfterID | OtherColumn1 | OtherColumn2
1 | NULL | ... | ...
2 | 1 | ... | ...
3 | 5 | ... | ...
4 | 3 | ... | ...
5 | 2 | ... | ...
I would set a constraint so that SortAfterID references ID.
If you now wanted to insert a new record (ID = 6) that goes between 1 and 2, you would:
Insert a new record with ID = 6 and SortAfterID = 1.
Update record with ID = 2 so that SortAfterID = 6.
I think this should be pretty low maintenance, and it's guaranteed to work no matter how many times you cram. The floating point number idea by #richardtallent should work as well, but as he mentioned, you could run into a limit.
EDIT
I just noticed the paragraph at the end of #richardtallent answer that mentions this same idea, but since I typed this all out, I figure I'll keep it here since it provides a little extra detail.

Resources