Hive Query - Pivot Table by First and Last Entry of Date

Hive Query - Pivot Table by First and Last Entry of Date - database

To start here is some sample data
Sample Input
ID Date Value
10 2012-06-01 00:01:45 20
10 2012-06-01 00:01:51 12
10 2012-06-01 00:01:56 21
10 2012-06-01 00:02:01 43
10 2012-06-01 00:02:06 12
17 2012-06-01 00:02:43 64
17 2012-06-01 00:02:47 53
17 2012-06-01 00:02:52 23
17 2012-06-01 00:02:58 45
17 2012-06-01 00:03:03 34
Desired Output
ID Date
10 2012-06-01 00:01:45 2012-06-01 00:02:06 20 12
17 2012-06-01 00:02:43 2012-06-01 00:03:03 64 34
So I am looking to get the first and last date, and values for both into a single line. The ID value in my table will also have other entries at later dates, so I only want to get the first and last for a chain of entries. Each entry is 5 secs apart. If they are greater then that it is a new chain.
Any suggestions?
Thanks

I'm just beginning the search process on this but it looks like LATERAL VIEW and EXPLODE coupled with maybe a user defined function or two are your friend.

I ended up creating a MapReduce job to work on the csv files of my data instead of using hive.
I "mapped" based on ID. Then set a parameter where if data's were further then 2 hours I separated them.
In the end it was easily to hack the MapReduce code then ponder hive queries.

Related

Create a running total in SQL with hours but only using work hours

This might be a strange question... but will try to explain the best i can ...
BTW : There is no chance in implementing through Stored Procedures... it should be made in SQL Query only ... But if the only option is SP, then i have to adapt to that ...
I have a table with the following elements :
RUN
WORKORDER
LOCATION
TRAVELTIME
NUMEQUIP
TOT_TIME
NO99
1
Start
NO99
2
Customer 1
112
1
8
NO99
3
Customer 2
18
11
88
NO99
4
Customer 3
22
93
744
NO99
5
Customer 4
34
3
24
I need to add a running DATE and HOUR by calculating the amount of time it takes from one line tho another BUT, and this is important, to take into consideration working hours ( from 9:00 to 13:00 and from 14:00 to 18:00 ( in US format : from 9am to 1pm, and 2pm to 6pm)... As example ... considering that my start date and time would be 10/May/2022 9:00 :
RUN
WORKORDER
LOCATION
TRAVELTIME
NUMEQUIP
TOT_TIME
DATE
TIME
NO99
1
Start
10/05/22
9:00
NO99
2
Customer 1
112
1
8
10/05/22
10:52
NO99
3
Customer 2
18
11
88
10/05/22
11:18
NO99
4
Customer 3
22
93
744
10/05/22
14:08
NO99
5
Customer 4
34
3
24
12/05/22
10:06
This result is achieved by calculating the estimated time to make the trip between customers (TRAVELTIME), and after arriving is also added the time spent on maintenance (TOT_TIME that is Number of Equipments (NUMEQUIP) vs 8 minutes per equipment)... By this, and since customer 3 will have 744 minutes (12 h and 58 minutes) in maintenance... and those minutes will spawn through 3 days, the result should be as shown...
With the following query i can have almost the desired effect... but cannot take into account only work hours ... and all time is continuous...
Select
RUN,WORKORDER,LOCATION,TRAVELTIME,
DateAdd(mi,temprunningtime-TOT_TIME,'9:00') As TIME,
NUMEQUIP,NUMEQUIP*8 AS TOT_TIME,sum(MYTABLE.TRAVELTIME +
MYTABLE.TOT_TIME) OVER (ORDER BY MYTABLE.ORDER) AS temprunningtime
FROM MYTABLE
With this query (slightly altered) i get an running TIME, but does not take into account the 13:00-14:00 stop, and the 18:00-9:00 stop...
It might be a bit confusing but any ideias on this would be very appreciated... and i will try to explain anyway i can...

Aggregate metric per date

I have a data source with data formatted like this:
ID
Visits
Charges
Date
Location
33
21
375
2022-01-29
A
34
4285
4400
2022-01-29
B
35
12
2165
2022-01-29
C
36
31
4285
2022-01-30
A
37
40
5881
2022-01-31
A
38
29
4715
2022-01-31
B
39
8
1390
2022-01-31
C
I want to get the aggregated visits of all locations per day, and from there getting the Max value of a day for the time period chosen by the user on a ScoreCard and a Table. At the moment when i choose the max value of the metric visits it only gives me the max value of column (4285), not for the aggregated data per day.
The value i am looking for, in the time period between 28-01 and 31-01 should be 4318 (the sum of all 3 locations for the 29-01, which is the highest of the 3 days)
Thanks!

What I may suggest is to use Pivot Table like this:
You choose Date as your row dimension. Then you choose Visits as metric (aggregation set as SUM).
Remember to sort this table by Visits in descending order. Your maximum value should be on top. If you want to see only this maximum value, you can change size of your pivot table to keep only first value visible.
This should work with additional controls too.

SQL Query for Pagination

I have students data for their fees paid for each program, now I want to show the outstanding fees, now since it's possible student could have their outstanding fees for 2018, 2019 and 2020 pending, hence it will have 3 rows (The months will be in columns). Now since the student is same, I will be clubbing the records in the front end, now if I consider pagination and I have 10 per page limit, and in these 10 records 3 records is of the same student (since year was different), in that case I will end up having just 7 records on given page.
Here's the sample data.
Studentname RollNo Year Program Jan Feb Mar Apr May Jun ...
abc 1 2018 p1 200 50 10 30 88 29
abc 1 2019 p1 100 10 20 50 12 22
abc 1 2020 p1 30 77 33 27 99 100
xyz 2 2020 p2 88 29 32 99 199 200
How could I manage pagination for above case.

Assuming your front end is HTML/CSS/Javascript:
You don't need to handle pagination in your query - or even your backend - at all. Everything can and should be done on your frontend. I would suggest using JQuery and Bootstrap to create a paginated table to display your data using Material Design for Bootstrap

COLUMNS_UPDATED() skips a bit starting with columns in the middle of the table

I'm using COLUMNS_UPDATED() in a trigger to identify those columns whose values should be written to an audit table. The trigger / auditing had been working fine for multiple years. I noticed yesterday that the auditing is no longer working consistently.
I've listed the first forty columns of the table in question at the bottom for reference, along with the ORDINAL_POSITION from INFORMATION_SCHEMA.COLUMNS. The table has a total of 109 columns.
I added print COLUMNS_UPDATED() to my trigger to get some debug info.
When I update CurrentOnFleaTick, the 9th column, I see this printed:
0x0001000000000000000000000000
This is expected - the 9th column should be represented as the least significant bit of the second byte. Similarly, if I update HasAttackedAnotherAnimalExplanation I see this:
0x0000010000000000000000000000
Again, expected - the 17th column should be represented as the least significant bit of the third byte.
But... when I update HouseholdIncludesCats, I see this:
0x0000000200000000000000000000
Not expected! Where you see the 2 there should be a 1, as HouseholdIncludesCats ordinal position is 25, making it the first column represented in the fourth byte, which should be represented in the least significant bit of that byte.
I narrowed things down by updating every column between HasAttackedAnotherAnimalExplanation and HouseholdIncludesCats and found that the 'off by one' problem I'm having starts with HouseTrainedId, ordinal position 24. When updating HouseTrainedId I'm expecting
0x0000800000000000000000000000
but instead I get
0x0000000100000000000000000000
which I believe is wrong, and it is what I expect to be getting for updates to the HouseholdIncludesCats column.
I don not believe the mask should skip ahead. The mask is currently not using the most significant bit of the 3rd byte.
I did recently drop a column, but I don't have a record of its ordinal position. Based on the original code that would have created the table, I believe the ordinal position of the column that was dropped was NOT 24. (I think it was 7... It had been defined after the BreedIds.)
I'm not necessarily looking for a deep root cause determination. If there was something I could do to reset whatever internal data SQL Server uses that'd be fine. Sort of like a rebuild index idea for table metadata? Is there something like that that might fix this?
Thanks in advance for helpful answers! :)
COLUMN_NAME ORDINAL_POSITION
PetId 1
AdopterUserId 2
AdoptionDeadline 3
AgeMonths 4
AgeYears 5
BreedIds 6
Color 7
CreatedOn 8
CurrentOnFleaTick 9
CurrentOnHeartworm 10
CurrentOnVaccinations 11
FoodTypeId 12
GenderId 13
GuardianForMonths 14
GuardianForYears 15
HairCoatLength 16
HasAttackedAnotherAnimalExplanation 17
HasAttackedAnotherAnimalId 18
HasBeenReferredByShelter 19
HasHadTraining 20
HasMedicalConditions 21
HasRecentlyBittenExplanation 22
HasRecentlyBittenId 23
HouseTrainedId 24
HouseholdIncludesCats 25
HouseholdIncludesChildren5to10 26
HouseholdIncludesChildrenUnder5 27
HouseholdIncludesDogs 28
HouseholdIncludesOlderChildren 29
HouseholdIncludesOtherPets 30
HouseholdOtherPets 31
KnowsCommandDown 32
KnowsCommandPaw 33
KnowsCommandSit 34
KnowsCommandStay 35
KnowsOtherCommands 36
LastUpdatedOn 37
LastVisitedVetOn 38
ListingCodeId 39
LitterTypeClumping 40

So... I thought I had googled enough before posting this, but I guess I hadn't. I found this:
https://www.sqlservercentral.com/forums/topic/columns_updated-and-phantom-fields
using COLUMNPROPERTY() to get ColumnID is definitely the way to go.

Can Someone Normalize this table?

I have this table to normalize for a uni project, now every time I think it should just
be two tables, I then think no it should be three... I am going to throw this out to one of you guys superior knowledge as maybe you can indicate the best way it should be done and why.
Number Type Single rate Double rate Family rate
1 D 56 72
2 D 56 72
3 T 50 72
4 T 50 72
5 S 48
6 S 48
7 S 48
8 T 50 72
9 T 50 72
10 D 56 72
11 D 56 72
12 D 56 72
13 D 56 72
14 F 56 72 84
15 F 56 72 84
16 S 48
17 S 48
18 T 50 72
20 D 56 72
Many thanks for anyone that can help me to see the corret way

It is not possible to produce correct table design unless one understands exactly what the columns mean and how the data columns depend on one another. However, here is an attempt that can be refined once you provide more information for us. The used naming is not as good as I'd like it to be but as I said, the purpose is not clear in the question. Anyway, this is a start, hope it would help you.
Also note that Normalization is not always required for all types of applications. For example, Business Intelligence could use schema that are deliberately not fully normalized (e.g. Star Schema). So the database design may sometimes depend on the application nature and how data change.
Main
----
MainID int PK
MainTypeID Char(1) Example: D, T, S etc.
MainRateIntersectionID Int
MainRateIntersection
--------------------
MainRateIntersectionID int PK
MainID int
RateCategoryID int
The combination of MainID and RateCategoryID should be constrained
using UNIQUE INDEX
RateCategory
------------
RateCategoryID int PK
RateCategoryText Varchar2(15) Not Null Example:Single, Family, etc.
RateValue Int Nullable
MainType
---------
MainTypeID Char(1) PK
Edit
Based on the new information, I have revised the model. I have removed the 'artificial' IDs since this is a training project for Normalization. Artificial IDs (surrogate keys) are correct to add but is not your objective as I guess. I have to add booking table that where a row would be inserted for each customer that makes a booking. You need to add appropriate customer information in that table. The table you provided is more of a logical view that could be returned form a query but not a physical table to store and update in the database. Instead, the bookings table should be used.
I hope this could help you.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight