I'm trying to query some data in Postgres and I'm wondering how I might use some sort of pattern matching not merely to select rows - e.g. SELECT * FROM schema.tablename WHERE varname ~ 'phrase' - but to select columns in the SELECT statement, specifically doing so based on the common names of those columns.
I have a table with a bunch of estimates of rates over the course of many years - say, of apples picked per year - along with upper and lower 95% confidence intervals for each year. (For reference, each estimate and 95% CI comes from a different source - a written report, to be precise; these sources are my rows, and the table describes various aspect of each source. Based on a critique from below, I think it's important that the reader know that the unit of analysis in this relational database is a written report with different estimates of things picked per year - apples in one Table, oranges in another, pears in a third.)
So in this table, each year has three columns / variables:
rate_1994
low_95_1994
high_95_1994
The thing is, the CIs are mostly null - they haven't been filled in. In my query, I'm really only trying to pull out the rates for each year: all the variables that begin with rate_. How can I phrase this in my SELECT statement?
I'm trying to employ regexp_matches to do this, but I keep getting back errors.
I've done some poking around StackOverflow on this, and I'm getting the sense that it may not even be possible, but I'm trying to make sure. If it isn't possible, it's easy to break up the table into two new ones: one with just the rates, and another with the CIs.
(For the record, I've looked at posts such as this one:
Selecting all columns that start with XXX using a wildcard? )
Thanks in advance!
If what you are basically asking is can columns be selected dynamically based on an execution-time condition,
No.
You could however use PL/SQL to build up a query as a string and then execute it using EXECUTE IMMEDIATE.
I have an SQL query that gives me a data set with 3 columns:
Contract Code
Volume
MonthRegistered
I want to present this data grouped on rows by Contract_Code and columns by MonthRegistered:
I then want to calculate a Percentage difference between the months:
I will only ever in this case have 2 months worth of data - Each 1 year apart.
I am trying to express the percentage variation from one year to the next for each row of data.
I did this expression:
=(Fields!Volume.Value)/(Fields!Volume.Value)
but CLEARLY it was not right - and how it is not right is it is not addressing the columns independently.
I did format the TABLIX text box as a percentage so at least I figured that one out.
in this Technet article: Calculating Totals and Other Aggregates (Reporting Services) it states:You can also write your own expressions to calculate aggregate values for one scope relative to another scope. I couldn't find reference to how to address the separate scopes.
I would appreciate any pointers on this one please!
Sorry for posting my examples as JPG rather than actual text but I needed to hide some of the data...
This only works because you will only ever have two months worth of data to compare. You have to make sure that your SQL has already ordered by MonthRegistered. If you do not order in your query then SSRS's own sorting will be applied to determine which value is first and last.
=First(Fields!Volume.Value) / Last(Fields!Volume.Value)
Because you have performed the aggregation in SSRS you may have to wrap each statement in a SUM expressions.
It would be advisable to perform the aggregation in SQL where possible, if you only plan on showing it in this way.
i want to store non gregorian datetime values in my database (postgresql or sql server)
i have two ways to do this.
1- storing standard datetime in database and then convert it to my sightly date system in my application.
2- storing datetime as varchar in two different fields (a date field and a time field) as YYYY-MM-DD and HH:MM:SS format in sightly date system
which way is better for improving performance regarding that thousands or millions of rows may exists in tables and sometimes i need to order rows.
Storing dates as strings will generally be very inefficient, both in storage and in processing. In Postgres, you have the possibility of defining your own type, and overloading the existing date functions and operators, but this is likely to be a lot of work (unless you find that someone did it already).
A quick search turned up this old mailing list thread, where one suggestion is to build input and output functions around the existing date types. This would let you make use of some existing functions (for instance, I'm guessing that intervals such as '1 day' and '1 year' have the same meaning; forgive my ignorance if not).
Another option would be to use integers or floats for storage, e.g. a Unix timestamp is a number of seconds since a fixed time, so has no built-in assumption about calendars. Unlike a string representation, however, it can be efficiently stored and indexed, and has useful operations defined such as sorting and addition. Internally, all dates will be stored using some variant of this approach; a custom type would simply keep these details more conveniently hidden.
I have a table to keep flights and i want to keep which days of the week this flight operates.
There is no need for date for this since i only need day names.
Firstly i thought to have a column in the flight table that will keep a single string with the day names inside and use my application logic to unravel the information.
This seems ok since the only operation on the days will be to retrieve them.
The thing is, i don't find this is "clean" enough so i thought of making a separate table to keep all 7 day names and a many to many (auto generated) table to keep the flight_id and day_id.
Still though, there are only 7 set values on days table and i am not so sure for the second approach either.
What i would like is some other opinions on how to handle this.
A flight can operate on many different days of a week
Only day names are needed - so, 7 in total.
Sorry for bad English and if this is a trivial question for some. I am not too experienced in both English language and databases.
Some databases support arrays. PostgreSQL for example supports arrays.
You could store the days in an array of integers and use a function to tanslate integers to day names. You could also use an array of a custom enum type (PostgreSQL Example).
This is a completely hypothetical question: let's say I have a database where I need to store memberships for a user, which can last for a specific amount of time (1 month, 3 months, 6 months, 1 year, etc).
Is it better to have a table Memberships that has fields (each date being stored as a unix timestamp):
user_id INT, start_date INT, end_date INT
or to store it as:
user_id INT, start_date INT, length INT
Either way, you can query for users with an active membership (for example). For the latter situation, arithmetic would need to be performed every time that query is ran, while the former situation only requires the end date to be calculated once (on insertion). From this point of view, it seems like the former design is better - but are there any drawbacks to it? Are there any common problems that can be avoided by storing the length instead, that cannot be avoided by storing the date?
Also, are unix timestamps the way to go when storing time/date data, or is something like DATETIME preferred? I have run into problems with both datatypes (excessive conversions) but usually settle on unix timestamps. If something like DATETIME is preferred, how does this change the answer to my previous design question?
It really depends on what type of queries you'll be running against your date. If queries involve search by start/end time or range of dates then start/and date then definitely go with first option.
If you more interested in statistic (What is average membership period? How many people are members for more than one year?) then I'd chose 2nd option.
Regarding excessive conversion - on which language are you programming? Java/Ruby use Joda Time under hood and it simplifies date/time related logic a lot.
I would disagree. I would have a start and end date - save on performing calculations every time.
If depends on whether you want to index the end date, which in turn depends on how you want to query the data.
If you do, and if your DBMS doesn't support function-based indexes or indexes on calculated columns, then your only recourse is to have a physical end_date so you can index it directly.
Other than that, I don't see much of a difference.
BTW, use the native date type your DBMS provides, not int. First, you'll achieve some measure of type safety (so you'll get an error if you try to read/write an int where date is expected), prevent you from crating a mismatching referential integrity (although FKs on dates are rare), it could handle time zones (depending on DBMS), DBMS will typically provide you with the functions for extracting date components etc...
From a design point of view, i find it a better design to have a start date and the length of the membership.
End date is a derivative of the membership start date + duration. This is how i think of it.
The two strategies are functionally equivalent, pick your favorite.
If the membership may toggle over time I would suggest this option:
user_id INT,
since_date DATE,
active_membership BIT
where the active_membership state is what is toggled over time, and the since_date is keeping track of when this happened. Furthermore, if you have finite set of allowed membership lengths and need to keep track of which length a certain user has picked, this can be extended to:
user_id INT,
since_date DATE,
active_membership BIT,
length_id INT
where length_id would refer to a lookup table of available and allowed membership lengths. However, please note, that in this case since_date becomes ambiguous if it possible to change the length of your membership. In that case you would have to extend this even further:
user_id INT,
active_membership_since_date DATE,
active_membership BIT,
length_since_date DATE,
length_id INT
With this approach it is easy to see that normalization breaks down when the two dates change asynchronously. In order to keep this normalized you actually need 6NF. If your requirements are going in this direction I would suggest looking at Anchor modeling.