Google Analytics (GA4) transaction data not accurate - analytics

1. Background
Using GA4 without GA360
2. Description
When i want to search the records by transactional level (Record by datetime) which will show the detailed record one by one, the event count is varied following by filtered date.
3. Hope to Achieve
I would like to achieve a normal report that will show every event count as 1 by filtering months or longer.
4. Weird Cases
Event count of every record is 1 when filtering 0.5 month, eg 1 Oct to 16 Oct.
Event count of every record is 2 when filtering 1 month, eg 1 Oct to 30 Oct.
Event count of every record is 3 when filtering 1.5 month, eg 15 Sep to 30 Oct.
5. Supporting
It is the sampling data from reports.
Heavily Sampled exploration
This report is based on 9.52% of available data. A smaller sample size means that the data in this report is less accurate. Learn More

Related

Tricky: SQL Server-side aggregation of time-series data for charting

I have a large time-series data set in a table that contains 5 years of data. The data is very structured; it is clustered/ordered on the time column and there is exactly one record for exactly every 10 minutes over this entire 5 year period.
In my user-side application I have a time-series chart that is 400 pixels wide, and users can set the time scale from 1 hour up to 5 years. Therefore any query to the database by this chart that returns more than 400 records provides data that cannot be physically displayed.
What I want to know is; can anyone suggest an approach such that when the database is queried for a certain time range, the SQL database would dynamically make a suitable averaging aggregation that returns no more than 400 records?
Example 1): if the time range was 5 years, SQL Server would calculate ~1 value for every 4.5 days (5yrs*365days/400records required), so would average all the 10 minute samples for each 4.5 day bin and return a record for each bin. About 400 in total.
Example 2): If the time range was one month, SQL Server would calculate ~1 record for every 1.85 hours (31 days/400records), so would average all the 10 minute samples for each 1.85 hour bin and return a record for each bin. About 400 in total.
Ideally I'd like a solution that from the applications perspective can be queried just like a static Table.
I'd really appreciate any suggested approaches or code snippets.
some examples, if you have a datetime column (which is not quite clear from your question, as there is not table schema):
Grouping into interval of 5 minutes within a time range
SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)
They should be quite easy to port to SQL server, use datediff to convert your datetime values into an unix timestamp and use round() with the function parameter <> 0 for the div.

How to loop informatica sessions

I want to load one table for data for say 1 month starting from 1 April to 30 April in successive manner.
i.e after loading data for 1 April, date should automatically increment to 2, load the data and increment and so on, till its 30 April.
Also, data of 2 April depends on 1 April data. So i cannot give a date range to load randomly.
How can I do it?
It would be preferable to get the loads done in single session run, instead of running the session for several times.
Sort the source data by date and use a Transaction Control transformation to enforce a commit every time the date changes.

Calculate Facebook likes, comments, and shares for different time zones from saved UTC

I've been struggle with this for a while and hope someone can give me an idea to tackle this.
We have a service that goes out and collects Facebook likes, comments, and shares for each status update multiple times a day. The table that stores this data is something like this:
PostId EngagementTypeId Value CollectedDate
100 1(for likes) 10 1/1/2013 1:00
100 2 (comments) 2 1/1/2013 1:00
100 3 0. 1/1/2013 1:00
100. 1. 12 1/1/2013 3:00
100. 2. 3. 1/1/2013 3:00
100. 3 5. 1/1/2013 3:00
Value holds the total for each engagement type at the time of collection.
I got a requirement to create a report that shows new value per day at different time zones.
Currently,I'm doing the calculation in a stored procedure that takes in a time zone offset and based on that I calculate the delta for each day. If this is for someone in California, the report will show 12 likes, 3 comments, and 5 shares for 12/31/2012. But if someone with the time zone offset of -1, he will see 10 likes on 12/31/2012 and 2 likes on 1/1/2013.
The problem I'm having is doing the calculation on the fly can be slow if we have a lot of data and a big date range. We're talking about having the delta pre-calculated for each day and stored in a table and I can just query from that ( we're considering SSAS but that's for the next phase). But doing this, I would need to have the data for each day for 24 time zones. Am I correct (and if so, this is not ideal) or is there a better way to approach this?
I'm using SQL 2012.
Thank you!
You need to convert UTC DateTime stored in your column to Date based on users UTC time. This way you don't have to worry about any table that has to be populated with data. To get users date from your UTC column you will use something like this
SELECT CONVERT(DATE,(DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), '01/29/2014 04:00')))
AS MyLocalDate
The select statement above figures out Local date based on the difference of UTC date and local Date. You will need to replace GETDATE() with users DATETIME that is passed in to your procedure and replace '01/29/2014 04:00' with your column. This way when you select any date from your table it will be according to what that date was at users local time. Than you can calculate other fields accordingly.

Cumulative Sum - Choosing Portions of Hierarchy

I have a bit of an interesting problem.
I required the cumulative sum on a set that is created by pieces of a Time dimension. The time dimension is based on hours and minutes. This dimension begins at the 0 hour and minute and ends at the 23 hour and 59 minute.
What I need to do is slice out portions from say 09:30 AM - 04:00 PM or 4:30PM - 09:30 AM. And I need these values in order to perform my cumulative sums. I'm hoping that someone could suggest a means of doing this with standard MDX. If not is my only alternative to write my own stored procedure which forms my Periods to date set extraction using the logic described above?
Thanks in advance!
You can create a secondary hiearchy in your time dimension with only the hour and filter the query with it.
[Time].[Calendar] -> the hierarchy with year, months, day and hours level
[Time].[Hour] -> the 'new' hierarchy with only hours level (e.g.) 09:30 AM.
The you can make a query in mdx adding your criteria as filter :
SELECT
my axis...
WHERE ( SELECT { [Time].[Hour].[09:30 AM]:[Time].[Hour].[04:00 PM] } on 0 FROM [MyCube] )
You can also create a new dimension instead of a hierarchy, the different is in the autoexists behaviour and the performance.

SSAS -> AdventureWorks Example -> Using the browser to splice a measure by week, shows results that have two of the same week records?

I have been working on a cube and noticed that when I am browsing measures in my cube by weeks, I am getting an unexpected result, but first let me display my current scenario. I am looking at counts of a fact load by weeks. When I do so I am getting results like these. :
Weeks | Fact Internet Sales Count
2001-07-01 00:00:00.000 | 28
2001-07-08 00:00:00.000 | 29
....and so on as you would expect.
Further down I noticed this. :
2001-09-30 00:00:00.000 | 10
2001-09-30 00:00:00.000 | 24
As you can see, it shows the week twice with different counts, when you add these counts together it is the correct number of counts for this week (i.e. 34).
I am just confused why it is showing two weeks, when I look at the data in sql I can see that the difference in data between these two is strictly the month in which these dates fell (10 in the earliest month and 24 and the later month in any example).
I initially saw this in my original cube that I created on my own, in turn, I pulled up trusty adventureWorks practice cube and found that it was present in that cube also.
This is due to the fact that within this date hierarchy, the lowest attribute in the hierarchy was date not week. Therefore, there was always a split for weeks by date. This can be alleviated by making a date hierarchy with week as the lowest portion of a date hierarchy.

Resources