Are Views or Functions faster in SQL? - sql-server

I have a table with customer receipts. I'm trying to generate a report based on the user's name, address, and purchases total based by department. The desired output should look like
|Customer |Address | Clothing | Electronics | Hardware | Household |
|Homer Simpson | 724 Evergreen Terr | $42 | $20 | $500 | $24 |
|Walter White | 308 Negra Arroyo Lane | $120 | $80 | $52 | $2400 |
The receipts table is part of a temporal model. So, the code looks like:
Select c.customername,a.address,r.receiptno,ir.department,ir.total
from customer c
inner join customer_address_lnk cal on cal.customerid = c.id
inner join address a on cal.addressid = a.id
inner join customer_receipts_lnk crl on crl.customerid = c.id
inner join receipts r on crl.receiptid = r.id
inner join receipts_receiptitem_lnk rrl on rrl.receiptid = r.id
inner join receiptitem ri on ri.id = rrl.receiptitemid
The lnk tables are linking tables.
The receiptitem table has the following columns: ID, Department, Amount, CreatedDate, UpdatedDate
The idea is that if the receipt is updated, the updated amount can be adjusted for returns, price adjustments, and so forth.
The goal is to get the query under 5 sec. Since we have over 125 million rows in the receiptitems table alone, it takes SQL 20+ minutes to calculate the report.
I've tried CTE's on views without success. I've tried different JOIN orders. I've used LEFT Joins. Even Pivot didn't slow it down. I still can't get it under 20 minutes.
Before I start down the path of creating a Function to get it under the 5 second goal, I'm open to any suggestions. I have limited ability to alter indices at this time.
Any thoughts?

Well, obviously views and SQL functions are different things.
Try to use a function where it needs to be clear to a user in the future (maybe yourself!) that the data returned requires certain parameters where the data does not make sense without those parameters. Sort of like forcing the user to include a WHERE clause.
In your example, you may want to force the user to filter by CustomerId or ReceiptId.
HOWEVER....
In this case, the view approach would probably be better.
Functions, by design, do not use temporary tables, but use table variables instead. Tables as variables are much slower than temp tables.
The query you've included is really straight forward with no surprises. The view would be the simplest and best approach here.
For 125M rows, I suggest either checking execution plan during processing (include a WHERE clause for this) or dumping data into a summary table that is updated periodically. Or both. Check indexes all along the way.
Here is more (better) discussion Test SQL Queries

Related

SQL If Else Logic

I have a staging table with the following structure
ID | BookID |Title | Cost |
----------------------------
1 | Test |1234 | 1234 |
This is populated through my system when an excel file is picked up, and I place all of the values inside this sheet into my staging table
I also have another table, for this example I'll call my Specials tables. It has an identical structure to my staging table.
ID | BookID |Title | Cost |
----------------------------
1 | Test |Mr Men | 4,99 |
What I'm doing now is amending a proc that is doing a whole host of calculations based on the data inside my staging table. A typical call to this looks like this:
BookTitle = dbo.StagingTable.Title
My amendment needs to check to see if in the books name in my staging table is also in the specials table. If it is, then I should bring back that data instead of the data inside of my staging table.
The BookId values are the same in both and I'm doing an Left Outer Join to tie them both together. What I'm struggling with is figuring out the correct syntax to do what I want.
LEFT OUT JOIN dbo.Specials s on dbo.StagingTable.BookId = s.BookId
Could someone point me in the right direction please?
The above is just small snippets from a larger proc that I can't share. So if things seem odd, that's why. I've simply taken the bits I could to help better explain my issue.
In T-SQL, you can do:
SELECT * FROM dbo.StagingTable
WHERE StagingTable.Title
IN (SELECT Specials.Title FROM dbo.Specials)
to get all the rows in the StagingTable that have a Title that is also present in the Specials table.
Please test following SELECT statement
I hope that is what you require
select
staging.ID,
staging.BookID,
Title = case when staging.Title <> isnull(Specials.Title,staging.Title) then Specials.Title else staging.Title end,
staging.Cost
from staging
left outer join Specials on staging.BookID = Specials.BookID

Metadata database design

I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)

Efficiently counting strength of relationship between rows in Postgres

I have a table that looks similar to this:
session_id | sku
------------|-----
a | 1
a | 2
a | 3
a | 4
b | 2
b | 3
c | 3
I want to pivot this into a table similar to this:
sku1 | sku2 | score
------|------|------
1 | 2 | 1
1 | 3 | 1
1 | 4 | 1
2 | 3 | 2
2 | 4 | 1
3 | 4 | 1
The idea is to store a denormalised table that allows one to look up for a given sku, what other skus are related to sessions it has been related to, and how many times both skus are related to the same session.
What algorithms, patterns or strategies could you suggest for implementing this in PostgreSQL or other technologies?
I realise that this kind of lookup can be done on the original table using counts, or using a facetting search engine. However, I want to make the reads more performant, and just want to keep the overall statistics. The idea is that I will be performing this pivot regularly on the newest few thousand rows in the first table, then storing the result in the second. I'm only concerned with approximate statistics for the second table.
I've got some SQL that works, but VERY slowly. Also looking into the potential for using a graph database of some sort, but wanted to avoid adding another technology for a small part of the app.
Update: The SQL below seems performant enough. I can convert 1.2 million rows in the first table (tags) into 250k rows in the second table (product_relations) with around 2-3k variations of sku in about 5 minutes on my iMac. I will realistically be denormalising only up to 10k rows per day. Question is whether this is actually the best approach. Seems a little dirty to me.
BEGIN;
CREATE
TEMPORARY TABLE working_tags(tag_id int, session_id varchar, sku varchar) ON COMMIT DROP;
INSERT INTO working_tags
SELECT id,
session_id,
sku
FROM tags
WHERE time < now() - interval '12 hours'
AND processed_product_relation IS NULL
AND sku IS NOT NULL LIMIT 200000;
CREATE
TEMPORARY TABLE working_relations (sku1 varchar, sku2 varchar, score int) ON COMMIT DROP;
INSERT INTO working_relations
SELECT a.sku AS sku1,
b.sku AS sku2,
count(DISTINCT a.session_id) AS score
FROM working_tags AS a
INNER JOIN working_tags AS b ON a.session_id = b.session_id
AND a.sku < b.sku
WHERE a.sku IS NOT NULL
AND b.sku IS NOT NULL
GROUP BY a.sku,
b.sku;
UPDATE product_relations
SET score = working_relations.score+product_relations.score
FROM working_relations
WHERE working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2;
INSERT INTO product_relations (sku1, sku2, score)
SELECT working_relations.sku1,
working_relations.sku2,
working_relations.score
FROM working_relations
LEFT OUTER JOIN product_relations ON (working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2)
WHERE product_relations.sku1 IS NULL;
UPDATE tags
SET processed_product_relation = TRUE
WHERE id IN
(SELECT tag_id
FROM working_tags);
COMMIT;
If I've interpreted your intention correctly (per comments) this should do it:
SELECT
s1.sku AS sku1,
s2.sku AS sku2,
count(session_id)
FROM session s1
INNER JOIN session s2 USING (session_id)
WHERE s1.sku < s2.sku
GROUP BY s1.sku, s2.sku
ORDER BY 1,2;
See: http://sqlfiddle.com/#!15/2e0b2/1
In other words: Self-join session, then find all pairings of SKUs for each session ID, excluding ones where the left is greater than or equal to the right in order to avoid repeating pairings - if we have (1,2,count) we don't want (2,1,count) as well. Then group by the SKU pairings and count how many rows are found for each pairing.
You may want to count(distinct session_id) instead, if your SKU pairings can repeat and you want to exclude duplicates. There will probably be more efficient ways to do that, but that's the simplest.
An index on at least session_id will be very useful. You may also want to mess with planner cost parameters to make sure it chooses a good plan - in particular, make sure effective_cache_size is accurate and random_page_cost vs seq_page_cost reflects your caching and I/O costs. Finally, throw as much work_mem at it as you can afford.
If you're creating a materialized view, just CREATE UNLOGGED TABLE whatever AS SELECT .... . That way you minimise the numer of writes/rewrites/overwrites.

Difference between a theta join, equijoin and natural join

I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?
A theta join allows for arbitrary comparison relationships (such as ≥).
An equijoin is a theta join using the equality operator.
A natural join is an equijoin on attributes that have the same name in each relationship.
Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms:
⋈ = πR,S-as ○ ⋈aR=aS
While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.
The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);
This isn't very useful in practice. Consider this example.
Product(PName, Price)
====================
Laptop, 1500
Car, 20000
Airplane, 3000000
Component(PName, CName, Cost)
=============================
Laptop, CPU, 500
Laptop, hdd, 300
Laptop, case, 700
Car, wheels, 1000
The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.
| PNAME | PRICE | CNAME | COST |
--------------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Laptop | 1500 | wheels | 1000 |
| Car | 20000 | CPU | 500 |
| Car | 20000 | hdd | 300 |
| Car | 20000 | case | 700 |
| Car | 20000 | wheels | 1000 |
| Airplane | 3000000 | CPU | 500 |
| Airplane | 3000000 | hdd | 300 |
| Airplane | 3000000 | case | 700 |
| Airplane | 3000000 | wheels | 1000 |
JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:
Product JOIN Component ON Pname
The associated SQL query would be like this (you can play with all the examples here)
SELECT *
FROM Product
JOIN Component
ON Product.Pname = Component.Pname
and the result:
| PNAME | PRICE | CNAME | COST |
----------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Car | 20000 | wheels | 1000 |
Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.
Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:
Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.
Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:
Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.
Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.
#outis's answer is good: concise and correct as regards relations.
However, the situation is slightly more complicated as regards SQL.
Consider the usual suppliers and parts database but implemented in SQL:
SELECT * FROM S NATURAL JOIN SP;
would return a resultset** with columns
SNO, SNAME, STATUS, CITY, PNO, QTY
The join is performed on the column with the same name in both tables, SNO. Note that the resultset has six columns and only contains one column for SNO.
Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S and SP are required):
SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;
The resultset will have seven columns, including two columns for SNO. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:
SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
or perhaps this
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
In other words, NATURAL JOIN in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT to SELECT DISTINCT yourself).
** I don't quite know what the result of SELECT * FROM table_expression; is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.
Natural is a subset of Equi which is a subset of Theta.
If I use the = sign on a theta join is it exactly the same as just
using a natural join???
Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)
This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.
Theta Join:
When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.
Equi Join:
When you make a query for join using equality operator only, then that join query comes under Equi join.
Example:
> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID;
> SELECT * FROM Emp INNER JOIN Dept USING(DeptID)
This will show:
_________________________________________________
| Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID |
| | | | |
Note: Equi join is also a theta join!
Natural Join:
a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.
Note: here, the join result has only one column for each pair of same named columns.
Example
SELECT * FROM Emp NATURAL JOIN Dept
This will show:
_______________________________
| DeptID | Emp.Name | Dept.Name |
| | | |
Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.
inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .
equi join of two tables are such that they display only those tuples which matches the value in other table . for example :
let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .
theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .
Natural Join: Natural join can be possible when there is at least one common attribute in two relations.
Theta Join: Theta join can be possible when two act on particular condition.
Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.

Can I do it only with SQL Queries or do I need a stored procedure?

I came across with a problem in which I need some advice/help.
Background:
I'm programming a map in an accounting software which requires special formatting and a tree-like hierarchy between the account codes, i.e.:
Account 1 is the father of accounts 11, 12 and 13;
On the other hand, account 11 is the father of accounts 111, 112 and 113;
There can also be an account relationship like this one: account 12 is the father of account 12111, because there is no 121 nor 1211, which means that the hierarchy is not a "one digit plus hierarchy" if I make my self clear;
And this logic continues till the end of the map.
Database table background:
There are 3 tables involved in all process.
A general ledger table (named ContasPoc) which I can relate with table number 2 (general ledger entries accounts) by each account code, in other words, the account code in table 1 is equal to the account code in table 2 (this field is named CodigoConta in both tables).
The general ledger entries accounts (named LancamentosContas) which I can relate with table number 1 (general ledger) as I said. This table is also related with table number 3 (general ledger entries) by an Id.
Table number 3 is as I said general ledger entries (named Lancamentos) and the field name that relates this table with number 2 is IdLancamento. I need this table for filtering purposes.
A real example:
I want all the accounts from 24 till 26 that have values/movements but I also want their parents. Moreover, their parents will have to contain the sums of their children accounts just like the example output bellow:
Account Code | Debit Sum | Credit Sum | Balance /*Balance = Debit Sum - Credit Sum*/
24 | 12,000 | 184,000 | -172,000 /*Sum of all children (242+243) */
242 | 12,000 | 48,000 | -36,000 /*Sum of all children (2423+2424) */
2423 | 12,000 | 0,000 | 12,000 /*Sum of all children (24231) */
24231 | 0,000 | 0,000 | 12,000 /*Account with values/movements */
2424 | 0,000 | 48,000 | -48,000 /*Sum of all children (24241) */
24241 | 0,000 | 48,000 | -48,000 /*Account with values/movements */
243 | 0,000 | 136,000 | -136,000 /*Sum of all children (2433) */
2433 | 0,000 | 136,000 | -136,000 /*Sum of all children (24331) */
24331 | 0,000 | 136,000 | -136,000 /*Sum of all children (243313) */
243313 | 0,000 | 136,000 | -136,000 /*Account with values/movements */
As you can see there are lots of accounts but only the ones that have the following comment /*Account with values/movements*/ have values, the others are calculated by me, which brings me to the real problem:
Can I do it? Of course, but at the moment I'm using recursive methods and several queries to do it and this approach takes a awful lot of time so I definitely need a new one because most of the customers use old computers.
I've tried to get this kind of structure/data using a query with INNER JOINS. Didn't work because I don't get all the data I need, only the lines with movements.
I also and tried with LEFT JOINS. Didn't work so well as well because I get all the data I need plus a lot of which I don't need. But using a LEFT JOIN has one more problem, I only get this extra data (also the one I need) if I do not include any fields from table number 3 in the WHERE clause. This obviously has an answer but because I'm not such an expert in SQL Server I'm not seeing what is the reason for that.
Here are the query I built (the other one has only 1 difference, a LEFT JOIN instead of the first INNER JOIN):
SELECT IdConta, Min(IdContaPai) AS IdContaPai, Min(ContasPoc.CodigoConta) AS CodigoConta
FROM ContasPoc INNER JOIN (LancamentosContas INNER JOIN Lancamentos ON
Lancamentos.IdLancamento = LancamentosContas.IdLancamento) ON ContasPoc.CodigoConta =
LancamentosContas.CodigoConta
WHERE Lancamentos.IdEmpresa=17 AND ContasPoc.IdEmpresa=17 AND Lancamentos.IdLancamento =
LancamentosContas.IdLancamento AND ContasPoc.CodigoConta>='24' AND ContasPoc.CodigoConta<='26'
GROUP BY IdConta
ORDER BY Min(ContasPoc.CodigoConta)
After this long explanation (sorry about that) my 3 questions are:
Is it possible to get such results (as I demonstrated above) only with database queries, in other words, without using recursive methods?
If not (most probable) are stored procedures the way to go and if so how can I do it (never used them before)?
Is there another method I'm not seeing?
Can someone help me on this one?
I hope I got all the relevant data in, but if not please tell me what I forgot and I will gladly answer back. Thanks in advance!
Miguel
WITH q (AccountCode, Balance) AS (
SELECT 24231, 12000
UNION ALL
SELECT 243313, -136000
UNION ALL
SELECT 24241, -48000
UNION ALL
SELECT 24, NULL
UNION ALL
SELECT 242, NULL
UNION ALL
SELECT 2423, NULL
)
SELECT qp.AccountCode, SUM(qc.Balance)
FROM q qp
JOIN q qc
ON SUBSTRING(CAST(qc.AccountCode AS VARCHAR), 1, LEN(qp.AccountCode)) = qp.AccountCode
GROUP BY
qp.AccountCode
ORDER BY
AccountCode
Unless I'm mistaken, this becomes easy if you think of Account Code as a string. You could query like:
select *,
(select sum(vw2.DebitSum)
from SomeView vw2
where vw.code < vw2.code and
vw2.code < cast(cast(vw.code as int) + 1 as varchar)
) as DebitSum
from SomeView vw
where '24' <= vw.code and vw.code < '27'
The subquery sums the debits for all children.
You never "need" stored procedures. Think of it this way, a stored procedure is just code with DB queries inside it which runs entirely on the server. You could always do that code in your client language and have the DB queries return result sets which you operate on on the client side.
i don't understand why you don't just have an account table with an id for the primary key, and for every child account have a parent account field. then you just reference back to the account table and get your hierarchy in a looping cte.

Resources