Is there a pattern in Cassandra for this task?

Is there a pattern in Cassandra for this task? - database

I've got 2 tables and basically everything's the same except that in the 1st table we've got PLAYER_ID as the last column and in the 2nd table we've got NUM_OF_PLAYERS instead.
So here's what I've got:
BIRTHDAYS_TABLE:
int year, PLAYER_ID
1988, 12312321 which stands for 'Messi'
1988, 5541 which stands for 'Some other footballer'
1989, 12312322 which stands for 'CR7' etc
And then once in a while (once a year for this example) I want to 'cache' these results:
NUM_OF_PLAYERS_TABLE:
int year, int NUM_OF_PLAYERS
1988, 2
And the only query I support is "give me number of players who were born in X". Is there any solution in Cassandra so I don't have to write a bicycle and create some sort of scheduler (which runs once in a year for this example) and delete rows from the very 1st table and add this count() to the 2nd one?
Obviously my tables aren't that simple, but I believe that the idea is the same.

You can do it with count aggregation, like this:
select year, count(*) from first_table where year=1981;
then grab value returned as count, and insert into your 2nd table.
Actual implementation will depend on programming language that you're using.

Related

How to execute a "process/function" in SQL Server

Imagine I have a database table with some columns, n columns and n rows, and one of that columns is a date (YY-MM-DD hh:mm:ss)
So I need to take the actual date, I know there is a function called CURRENT_DATE.
And I want to do some "logic" with the actual date and the date for every row in the database (there is a column in the table with a date, that one), that logic simply to compare the years and months between them and if the difference between one to the other is X months, I will return that row, and if not, I will not return it.
So, simply as return everything in the DB with the condition of that "logic" and which will not accomplish, don't return it.
The problem is, where should I put that logic in a SQL query, I don't think I really can. Can I do what I want with SQL, or it's necessary some type of stuff?
Example Data:
So if I want that the query only return that rows that the difference between the actual Date and which it's column Date, is less than 3 months for example,
it should return Google, Amazon, Twitter, YouTube and Microsoft

Unless I'm missing something obvious here, you've just really, really over-complicated a simple where clause:
SELECT A, B, C -- Please tell me these are not your actual column names!
FROM TableName
WHERE C >= DATEADD(MONTH, -3, GETDATE())
AND C <= DATEADD(MONTH, 3, GETDATE()) -- Assuming future dates are also in the table

multiple condition on joined table

I have one small database for exercise, please see below ER-Diagram
I want to write a query that List student last and first names and majors for students who had at least one high grade (>= 3.5) in at least one course offered in fall of 2012.
My code below:
select s.StdNo,s.StdFirstName,s.StdLastName,s.StdMajor,e.EnrGrade,o.OfferNo,o.OffYear
from Enrollment e
join Offering o on e.OfferNo=o.OfferNo
join Student s on s.StdNo=e.StdNo
where e.EnrGrade >=3.5 and o.OffYear="2010";
But I got an SQL Error
[207] [S0001]: Invalid column name '2010'
I am confused about the error, value "2010" is NOT a column name, the Offyear is column. So why did this happen?
The basic query is not that hard, but I am stuck on （multiple）nested query.

Offyear is shown as a number, so you should compare against the number 2010, not the text "2010":
[...] and Offyear = 2010

DB Schema: Versioned price model vs invoice-related data

I am creating some db model for rental invoice generation.
The invoice consists of N booking time ranges.
Each booking belongs to a price model. A price model is a set of rules which determine a final price (base price + season price + quantity discout + ...).
That means the final price for the N bookings within an invoice can be a complex calculation, and of course I want to keep track of every aspect of the final price calculation for later review of an invoice.
The problem is, that a price model can change in the future. So upon invoice generation, there are two possibilities:
(a) Never change a price model. Just make it immutable by versioning it and refer to a concrete version from an invoice.
(b) Put all the price information, discounts and extras into the invoice. That would mean alot of data, as an invoice contains N bookings which may be partly in the range of a season price.
Basically, I would break down each booking into its days and for each day I would have N rows calculating the base price, discounts and extra fees.
Possible table model:
Invoice
id: int
InvoiceBooking # Each booking. One invoice has N bookings
id: int
invoiceId: int
(other data, e.g. guest information)
InvoiceBookingDay # Days of a booking. Each booking has N days
id: int
invoiceBookingId: id
date: date
InvoiceBookingDayPriceItem # Concrete discounts, etc. One days has many items
id: int
invoiceBookingDayId: int
price: decimal
title: string
My question is, which way should I prefer and why.
My considerations:
With solution (a), the invoice would be re-calculated using the price model information each time the data is viewed. I don't like this, as algorithms can change. It does not feel natural for the "read-only" nature of an invoice.
Also the version handling of price models is not a trivial task and the user needs to know about the version concept, which adds application complexity.
With solution (b), I generate a bunch of nested data and it adds alot of complexity to the schema.
Which way would you prefer? Am I missing something?
Thank you

There is a third option which I recommend. I call it temporal (time) versioning and the layout of the table is really quite simple. You don't describe your pricing data so I'll just show a simple example.
Table: DailyPricing
ID EffDate Price ...
A 01/01/2015 17.50 ...
B 01/01/2015 20.00 ...
C 01/01/2015 22.50 ...
B 01/01/2016 19.50 ...
C 07/01/2016 24.00 ...
This shows that all three price schedules (A, B and C just represent whatever method you use to distinguish between price levels) were given a price on Jan 1, 2015. On Jan 1, 2016, the price of plan B was reduced. In July, the price of plan C was increased.
To get the current price of a plan, the query is this:
select dp.Price
from DailyPricing dp
where dp.ID = 'A'
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= :DateOfInterest);
The DateOfInterest variable would be loaded with the current date/time. This query returns the one price that is currently in effect. In this case, the price set Jan 1, 2015 as that has never changed since taking effect. If the search had been for plan B, the price set on Jan 1, 2016 would have been returned and for plan C, the price set on July 1, 2016. These are the latest prices set for each plan; that is, the current prices.
Such a query would more likely be in a join with probably the invoice table so you could perform the price calculation.
select ...
from Invoices i
join DailyPricing dp
on dp.ID = i.ID
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= i.InvoiceDate )
where i.ID = 1234;
This is a little more complex than a simple query but you are asking for more complex data (or, rather, a more complex view of the data). However, this calculation is probably only executed once and the final price stored back in to the invoice data or elsewhere.
It would be calculated again only if the customer made some changes or you were going through an audit, rechecking the calculation for accuracy.
Notice something, however, that is subtle but very important. If the query above were being executed for an invoice that had just been created, the InvoiceDate would be the current date and the price returned would be the current price. If, however, the query was being run as a verification on an invoice that was two years old, the InvoiceDate would be two years ago and the price returned would be the price that was in effect two years ago.
In other words, the query to return current data and the query to return past data is the same query.
That is because current data and past data remain in the same table, differentiated only by the date the data takes effect. This, I think, is about the simplest solution to what you want to do.

How about A and B?
It's not best practice to re-calculate any component of an invoice, especially if the component was printed. An invoice and invoice details should be immutable, and you should be able to reproduce it without re-calculating.
If you ever have a problem with figuring out how you got to a certain amount, or if there is a bug in your program, you'll be glad you have the details, especially if the calculations are complex.
Also, it's a good idea to keep a history of your pricing models so you can validate how you got to a certain price. You can make this simple to your users. They don't have to see the history -- but you should record their changes in the history log.

Query in relational algebra without using aggregate functions

Task from exam for subject Database Systems:
I have following schema:
Excavator(EID, Type) - EID is a key
Company(Name, HQLocation) - Name is a key
Work(Name, EID, Site, Date) - All collumns together form a key
I have to write this query in relational algebra:
"Which company was digging on exactly one site on 1st of May?"
I don't know how to express it without aggregate functions (count). I know that people add these functions to relational algebra but we were forbidden to do it during this exam.
You can use standart set operations, division, projection, selection, join, cartesian product.

I forget the proper relational algebra syntax now but you can do
(Worked on >= 1 site on 1st May)
minus (Worked on > 1 site on 1st May)
--------------------------------------
equals (Worked on 1 site on 1st May)
A SQL solution using only the operators mentioned in the comments (and assuming rename) is below.
SELECT Name
FROM Work
WHERE Date = '1st May' /*Worked on at least one site on 1st May */
EXCEPT
SELECT W1.Name /*Worked more than one site on 1st May */
FROM Work W1
CROSS JOIN Work W2
WHERE W1.Name = W2.Name
AND W1.Date = '1st May'
AND W2.Date = '1st May'
AND W2.Site <> W2.Site
I assume this will be relatively straight forward to translate

Is there a difference between id,date or date,id index in sql server

I'm creating an index in sql server 2005 and the discussion with a coworker is if it makes a difference between index key columns being id and date vs date then id.
Is there a fundamental difference in the way the index would be created in either scenario?
Would it make a difference in other versions of SQL server?
Thanks

Yes, definitely. Does anyone ever query the table for JUST date or JUST id? An index of date,id can be used to look up just date, but not just id, and vice-versa
Using date,id:
Jan 1 4
Jan 1 7
Jan 2 6
Jan 2 9
Jan 2 33
Jan 3 23
Jan 4 1
Using id,date:
1 Jan 4
4 Jan 1
6 Jan 2
7 Jan 1
9 Jan 2
23 Jan 3
33 Jan 2
If your WHERE clause or a JOIN in your query is using both date and id, then either index is fine. But you can see that if you're doing a lookup just by date, the first index is useful for that, but the second one is totally random.
In a more general sense, an index on A, B, C, D is going to be useful for queries on A,B,C,D, OR A,B,C OR A,B OR just A.

The order of columns does matter when it comes to indexes. Whether or not it'll matter in your case depends.
Let me explain.
Let's say you have a person table, with first, last, and middle name.
So you create this index, with the columns in the following order:
FirstName, MiddleName, LastName
Now, let's say you now do a query using a WHERE on all of those columns. It'll use the entire index.
But, let's say you only query on first and last name, what happens now is that while it will still use the query, it will grab the range of the index that has the same first name as your WHERE-clause, then scan those, retrieving those that have a matching last name. Note, it will scan all the rows with the same first name.
However, if you had rearranged the index, like this:
FirstName, LastName, MiddleName
Then the above query would grab the range of the index that has the same first and last name, and retrieve those.
It's easier to grasp if you look at it in another way.
The phone book is sorted by last name, then firstname and middle name. If you had put middle name between first and last name, and sorted, then people with the same first and last name would seemingly be all over the place, simply because you sorted on middle name before first name.
Hence, if you're looking for my name, which is "Lasse Vågsæther Karlsen", you'll find all the Karlsen-people, we would be located in a sequential list in the phone book, but my name would be seemingly randomly placed, simply because the list would then be sorted by Vågsæther.
So an index can be used, even if the query doesn't use all the columns in the index, but the quick lookup-features only work as long as the columns are listed at the front of the index. Once you skip a column, some kind of scan takes place.
Now, if all your queries use both id and date, it won't matter much, but if all the queries include date, and only some of them contain an id, then I'd put date first, and id second, this way the index would be used in more cases.

Yes, it does matter. Suppose you create an index on columns (A, B). You can do a SELECT with a WHERE clause including both columns and the index can be used. The index will also be used if you do a SELECT with a WHERE that only includes column A. But if you do a SELECT with a WHERE that only includes column B, the index can't be used.
See here for more info.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight