Totals in reports when some values are calculated? MS Access - database

I am using a report to show wastage calculations after converting our stock.
I am trying to get the totals to display at the bottom, however some of the columns are made up of fields that are calculated (such as sValue-Wastage)
For these fields I can't seem to use =Sum(rValue), it acts as though this is a variable that is to be decided by the user when I input this, it also adds square brackets around rValue (=Sum([rValue])).
Is there some different way of acheiving this that I need to know about? I am trying to get:
=Sum(Wastage)
Where Wastage is:
=[sValue]-[rValue]
Thanks,
Bob P

You must sum the two calculated fields.
=Sum([sValue]-[rValue])
From your comments, it appears that rValue holds a domain aggregate function. You can try:
=Sum([sValue]-DSum("sValue","[Stock Conversion Items]")
This will be okay if all SCIDs in [Stock Conversion Items] should be included, if not you may be able to add a WHERE statement. If the relationship is too complicated for a WHERE statement, consider basing your report on a query that includes [Stock Conversion Items].
It is an old story: http://support.microsoft.com/kb/208850

Related

SQL Server Unexpected Results from Data Type Change

I am working with acoustic data that shows decibel levels broken down between frequencies (1/3 octave bands). These values were imported from a flat text file and all have 1 decimal (e.g., 74.1 or -8.0).
I need to perform a series of calculations on the table in order to obtain other acoustic measures (calculated minute-level data by applying acoustic formulas to my given second-level data.) I am attempting to do this with a series of nested select statements. First, I needed to get the decibel values divided by 10. I did fine with that. Now I'd like to feed the generated fields output from this select statement into another that raises 10 to the power of my generated values.
So, if the 20000_Hz field had a value of 16.3, my generated table would have a value of 1.63 for that record, and I'd like to nest that into another select statement that generates 10^1.63 for that field and record.
To do this, I've been experimenting with the POWER() function. I tried POWER(10,my_generated_field) and got all zeros. I realized that the format of the base determines the format of the output, meaning that if I did something like POWER(10.0000000000000000000,my_generated_field) I'd start to see actual numbers like 0.0000000000032151321. Also, I tried altering my table to change the data type for decibel values to decimal(38,35) to see what effect this would have. I believe I initially set the data type as float using the flat file import tool.
To my surprise, numbers that were imported from the flat text file did not simply have more zeros tacked on the end, but had other numbers. For instance, a number like 46.8 now might read something like 46.8246546546843543210058 rather than 46.8000000000000000 as I'd expect.
So my two questions are:
1) Why did changing data types not create the results I expected, and where is SQL getting these other numbers?
2) How should I handle data types for my decibel values so that I don't loose accuracy when doing the 10^field_value thing?
I've spent some time reading about data types, the POWER() function, etc., but still don't feel like I'm going to understand this on my own.

Multiple IF QUARTILEs returning wrong values

I am using a nested IF statement within a Quartile wrapper, and it only kind of works, for the most part because it's returning values that are slightly off from what I would have expected if I calculate the range of values manually.
I've looked around but most of the posts and research is about designing the fomrula, I haven't come across anything compelling in terms of this odd behaviour I'm observing.
My formula (ctrl+shift enter as it's an array): =QUARTILE(IF(((F2:$F$10=$W$4)($Q$2:$Q$10=$W$3))($E$2:$E$10=W$2),IF($O$2:$O$10<>"",$O$2:$O$10)),1)
The full dataset:
0.868997877*
0.99480118
0.867040346*
0.914032128*
0.988150438
0.981207615*
0.986629288
0.984750004*
0.988983643*
*The formula has 3 AND conditions that need to be met and should return range:
0.868997877
0.867040346
0.914032128
0.981207615
0.984750004
0.988983643
At which 25% is calculated based on the range.
If I take the output from the formula, 25%-ile (QUARTILE,1) is 0.8803, but if I calculate it manually based on the data points right above, it comes out to 0.8685 and I can't see why.
I feel it's because the IF statements identifies slight off range but the values that meet the IF statements are different rows or something.
If you look at the table here you can see that there is more than one way of estimating quartile (or other percentile) from a sample and Excel has two. The one you are doing by hand must be like Quartile.exc and the one you are using in the formula is like Quartile.inc
Basically both formulas work out the rank of the quartile value. If it isn't an integer it interpolates (e.g. if it was 1.5, that means the quartile lies half way between the first and second numbers in ascending order). You might think that there wouldn't be much difference, but for small samples there is a massive difference:
Quartile.exc Rank=(N+1)/4
Quartile.inc Rank=(N+3)/4
Here's how it would look with your data

Why is this Formula for Alteryx returning 0's instead of averages

I was wondering what is wrong with the following formula.
IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF
What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].
Many thanks all!
We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.
The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.
The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g.
IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF
The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.
That is not the way the Average function works. You need to pass it the entire list of values, not just one.

How to represent end-of-time in a database?

I am wondering how to represent an end-of-time (positive infinity) value in the database.
When we were using a 32-bit time value, the obvious answer was the actual 32-bit end of time - something near the year 2038.
Now that we're using a 64-bit time value, we can't represent the 64-bit end of time in a DATETIME field, since 64-bit end of time is billions of years from now.
Since SQL Server and Oracle (our two supported platforms) both allow years up to 9999, I was thinking that we could just pick some "big" future date like 1/1/3000.
However, since customers and our QA department will both be looking at the DB values, I want it to be obvious and not appear like someone messed up their date arithmetic.
Do we just pick a date and stick to it?
Use the max collating date, which, depending on your DBMS, is likely going to be 9999-12-31. You want to do this because queries based on date ranges will quickly become miserably complex if you try to take a "purist" approach like using Null, as suggested by some commenters or using a forever flag, as suggested by Marc B.
When you use max collating date to mean "forever" or "until further notice" in your date ranges, it makes for very simple, natural queries. It makes these kind of queries very clear and simple:
Find me records that are in effect as of a given point in time.
... WHERE effective_date <= #PointInTime AND expiry_date >= #PointInTime
Find me records that are in effect over the following time range.
... WHERE effective_date <= #StartOfRange AND expiry_date >= #EndOfRange
Find me records that have overlapping date ranges.
... WHERE A.effective_date <= B.expiry_date AND B.effective_date <= A.expiry_date
Find me records that have no expiry.
... WHERE expiry_date = #MaxCollatingDate
Find me time periods where no record is in effect.
OK, so this one isn't simple, but it's simpler using max collating dates for the end point. See: this question for a good approach.
Using this approach can create a bit of an issue for some users, who might find "9999-12-31" to be confusing in a report or on a screen. If this is going to be a problem for you then drdwicox's suggestion of using a translation to a user-friendly value is good. However, I would suggest that the user interface layer, not the middle tier, is the place to do this, since what may be the most sensible or palatable may differ, depending on whether you are talking about a report or a data entry form and whether the audience is internal or external. For example, some places what you might want is a simple blank. Others you might want the word "forever". Others you may want an empty text box with a check box that says "Until Further Notice".
In PostgreSQL, the end of time is 'infinity'. It also supports '-infinity'. The value 'infinity' is guaranteed to be later than all other timestamps.
create table infinite_time (
ts timestamp primary key
);
insert into infinite_time values
(current_timestamp),
('infinity');
select *
from infinite_time
order by ts;
2011-11-06 08:16:22.078
infinity
PostgreSQL has supported 'infinity' and '-infinity' since at least version 8.0.
You can mimic this behavior, in part at least, by using the maximum date your dbms supports. But the maximum date might not be the best choice. PostgreSQL's maximum timestamp is some time in the year 294,276, which is sure to surprise some people. (I don't like to surprise users.)
2011-11-06 08:16:21.734
294276-01-01 00:00:00
infinity
A value like this is probably more useful: '9999-12-31 11:59:59.999'.
2011-11-06 08:16:21.734
9999-12-31 11:59:59.999
infinity
That's not quite the maximum value in the year 9999, but the digits align nicely. You can wrap that value in an infinity() function and in a CREATE DOMAIN statement. If you build or maintain your database structure from source code, you can use macro expansion to expand INFINITY to a suitable value.
We sometimes pick a date, then establish a policy that the date must never appear unfiltered. The most common place to enforce that policy is in the middle tier. We just filter the results to change the "magic" end-of-time date to something more palatable.
Representing the notion of "until eternity" or "until further notice" is an iffy proposition.
Relational theory proper says that there is no such thing as null, so you're obliged to have whatever table it is split in two: one part with the rows for which the end date/end time is known, and another for the rows for which the end time is not yet known.
But (like having a null) splitting the tables in two will make a mess of your query writing too. Views can somewhat accommodate the read-only parts, but updates (or writing the INSTEAD OF on your view) will be tough no matter what, and likely to affect performance negatively no matter what at that).
Having the null represent "end time not yet known" will make updating a bit "easier", but the read queries get messy with all the CASE ... or COALESCE ... constructs you'll need.
Using the theoretically correct solution mentioned by dportas gets messy in all those cases where you want to "extract" a DATE from a DATETIME. If the DATETIME value at hand is "the end of (representable) time (billions of years from now as you say)", then this is not just a simple case of invoking the DATE extractor function on that DATETIME value, because you'd also want that DATE extractor to produce the "end of representable DATEs" for your case.
Plus, you probably do not want to show "absent end of time" as being a value 9999-12-31 in your user interface. So if you use the "real value" of the end of time in your database, you're facing a bit of work seeing to it that that value won't appear in your UI anywhere.
Sorry for not being able to say that there's a way to stay out of all messes. The only choice you really have is which mess to end up in.
Don't make a date be "special". While it's unlikely your code would be around in 9999 or even in 2^63-1, look at all the fun that using '12/31/1999' caused just a few years ago.
If you need to signal an "endless" or "infinite" time, then add a boolean/bit field to signal that state.

Wrap a SQL Reporting Matrix

I have a matrix in SQL reporting and I would like it to print on an A4 page. If the matrix has less than 4 columns then it fits but for more than 4 columns I would like the matrix to wrap and show only 4 columns per page. Is this possible? I am using SQL Reporting 2005 in localmode.
I found a work around:
First I added a field to my datasource called column count. Because the datasource is built in a business object it was easy for me to tell how many columns of data there is.
Next I created a list on my report and moved my matrix into the list.
I made the group expression =Ceiling(Fields!ColumnCount.Value/4) for the list.
In short I am telling the list to break every 4 columns. This causes the matrix to be split after 4 columns.
This will not work in all scenarios and probably screws up subtotalling but it worked for my application.
Disclaimer: this was not my idea...I adapted it from Chris Hays's Sleezy Hacks.
There is no way to intrinsically wrap columns; Mboy's solution above is very similar to what I have done in the past so I won't repeat his steps here, although I will warn you: for matrices with a large number of columns you will grow the number of pages in your report exponentially. In your case this may not be a problem; but we have found that in most cases it is cheaper ( in terms of page output) not to wrap columns.
Further to MBoy's answer, I wanted to show multiple charts on one page, but the number of charts would vary depending on the data. What I wanted was to show two charts on a row with as many rows as necessary. I did as follows:
As suggested by MBoy, I created a 'Count' field called [ChartNumber] in the data that increases by one for each chart (so if I had 7 charts, rows would be numbered 1-7).
To achieve this I used the DENSE_RANK() SQL function to create a field in my query, such as DENSE_RANK() OVER (ORDER BY [Data].[ItemtoCount]) AS [ChartNumber].
So if I wanted a different chart for each department I might use DENSE_RANK() OVER (ORDER BY [Data].[Department]) AS [ChartNumber]
I added a list to the form and bound to my dataset
I then set the row group to group on =Ceiling(Fields!ChartNumber.Value/2)
I then added a colum group on =Ceiling(Fields!ChartNumber.Value Mod 2)
Create a chart inside the list and preview, and you should see two charts side-by-side on each row.
I used charts, but you could easily put a matrix or any other item inside the list.
Edit: A more general solution for n columns is =Ceiling(Fields!ChartNumber.Value / n) and =Ceiling(n * Fields!ChartNumber.Value Mod n)
I don't think so. I've found that exporting to excel then printing was the most flexable way of printing SSRS matrix reports I've found - esp. since most of my users know excel well.
According to MSDN, Tablix data regions do pagination horizontally in much the same way a table does it vertically, which is to say you can specify a page break on a group change. There is another MSDN article that suggests the use of a pagination expression, but this technique is already explained by MBoy so I won't repeat it, except to say that it is an endorsed technique.

Resources