SQL Server - defining prescence - sql-server

I have thousands of daily report spreadsheets from a number of locations that include a list of each staff member working that day. Unfortunately the reports are not easy to sort into chronological order from the filename or the creation date. I will be using .Net to populate SQL Server tables with the data.
My aim is to be able to produce a report that looking like this, showing the 2 weeks (might be more or less) that each person worked:
'Worker 1' 'Location A' 2016-04-05 to 2016-04-19
'Worker 2' 'Location A' 2016-04-19 to 2016-05-03
'Worker 2' 'Location B' 2017-01-01 to 2017-01-14
'Worker 2' 'Location A' 2017-02-01 to 2017-02-14
Unfortunately, as each report is processed, all I know is the location and a list of people that were there on the day. So, I don't know until later whether a person has finished their 2 weeks or not.
My question is - what is the best structure to store the data to allow analysis?
I thought of 1 record per location and worker, with start date & end data, but this won't work if Worker 1 works at Location A for 2 weeks, has 2 weeks off and comes back to the same location, as I would want to present that in my report as 2 sessions of work.
I also considered 1 record per day per location with 1 field per worker but this would get messy as workers came on & off hire.
I also considered 1 record per worker per location with 1 field per day but I already have 3 years worth of reports so the amount of fields will get unwieldy.
Is there a good way to record the initial data to be able to extract as I want it for analysis? Or should I just pre-process the daily reports to ensure they are processed in chronological order before writing them into tables?

Related

PowerApps - split Collection into smaller Collections based on database data

Short form of my question: Is there a way to split a Collection into several sub-Collections, based on SQL database entries?
Update
Okay, a simpler version, perhaps. I'm really struggling with this...
I start with a collection: ScanDataCollection_SmartComm_MasterList
It looks like this:
Result
REQ1991799.RITM2280596.01
REQ2048874.RITM2349401.01
REQ2037354.RITM2335400.01
I have a database table:
Master_Transaction_Log
...which has three particular columns of interest:
Timestamp
Scan_Code
Transaction_Type
I would like to end up with TWO collections:
SC_ReturnToDepot
Result
REQ1991799.RITM2280596.01
SC_Remainder_1
Result
REQ2048874.RITM2349401.01
REQ2037354.RITM2335400.01
The criteria is as follows: for any given Result in ScanDataCollection_SmartComm_MasterList, if:
A database record has Scan_Code = Result AND Transaction_Type = "New Equipment Delivery - Cust. Msg: Equipment Returning to Depot" AND Timestamp > 72 hours ago, then that value of Result is added to SC_ReturnToDepot
SC_Remainder_1 are all remaining values that do not fit the above criteria.
I got as far as this so far, but it's killin' me after this:
ClearCollect(SC_ReturnToDepot,
ForAll(ScanDataCollection_SmartComm_MasterList,
...?
);
);
I have a feeling if I can just nail that one single line of code, I am off to the races, but this is just... ugh, my brain is being a jerk.
-=-=-=-=-=-=-
Longer more detailed explanation:
I'm trying to create a mechanism that takes a list of Scan Codes from a physical in-field process (scanning codes on boxes on a shelf), and for each Code, and reviews a SQL database table (Master_Transaction_Log), looking for that code. It creates another Collection that includes those associated Scan Codes, in an exclusive fashion.
At a high level:
Collection ScanDataCollection_SmartComm_MasterList, which contains one column Result, must be split out into the seven different lists, below. These lists are in a specific sequence (because we send communications in a specific sequence):
Collection Name: ScanDataCollection_SmartComm_ReturnToDepotImmediately - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code, where Transaction_Type = "New Equipment Delivery - Cust. Msg: Equipment Returning to Depot" and where Timestamp is older than 48 hours. - Notes: The technician will be instructed to immediately place this item into an outgoing bin for pickup. - Data Note: This Scan Code must not be included in any Collection below this one on this list.
Collection Name: ScanDataCollection_SmartComm_AnnounceRemovalOfItem - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code, where Transaction_Type = "New Equipment Delivery - Cust. Msg: Final Warning" and where Timestamp is older than 48 hours. - Notes: The customer receives an email for these items, announcing that the items will be returned to the Depot. - Data Note: This Scan Code must not be included in any Collection below
this one on this list.
Collection Name: ScanDataCollection_SmartComm_SendFinalWarning - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code, where Transaction_Type = "New Equipment Delivery - Cust. Msg: Warning" and where Timestamp is older than 48 hours. - Notes: The customer receives an email for these items, announcing that this is their last chance to pick them up. - Data Note: This Scan Code must not be included in any Collection below this one on this list.
Collection Name: ScanDataCollection_SmartComm_SendFirstWarning - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code, where Transaction_Type = "New Equipment Delivery - Cust. Msg: Reminder" and where Timestamp is older than 48 hours. - Notes: The customer receives an email for these items, announcing that this is a warning to pick them up. - Data Note: This Scan Code must not be included in any Collection below this one on this list.
Collection Name: ScanDataCollection_SmartComm_SendReminder - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code, where Transaction_Type = "New Equipment Delivery - Cust. Msg: First Contact" and where Timestamp is older than 48 hours. - Notes: The customer receives an email for these items, announcing that this is a reminder to collect their order. - Data Note: This Scan Code must not be included in any Collection below this one on this list.
Collection Name: ScanDataCollection_SmartComm_SendFirstContact - Criteria: If there is an entry in the database table where Scan_Code = the given Scan Code and where Transaction_Type = "New Equipment Delivery - Received at Stockroom/Tech Bar". (note: no time delay -- this should go out immediately upon arriving at a location) - Notes: The customer receives an email for these, announcing the orders are ready to pick up.
Collection Name: ScanDataCollection_SmartComm_NoActionTaken - Criteria: The given Scan Code (Result) has not been added to any of the items above, based on previous logical rules. For example, a Reminder might have been sent, but it was sent only ten hours ago, so we take no action on this item.
Each of these COLLECTIONS only needs a single column: Result.
It's important that no single Scan Code reside in more than one collection -- they are built to address an escalating notification sequence. For example, a Scan Code for an item that has been on a shelf for a week may already have a First Contact sent, a Reminder sent, and a Warning sent. It must ONLY go into the ScanDataCollection_SmartComm_SendFinalWarning Collection, because all items in that Collection will be sent Final Warnings.
I already have written the parts of the program that generates and sends the appropriate emails (including householding, which was a tough nut to crack). I think all I need is to be able to split my Master Collection into these sub-Collections and then I can simply attack each sub-Collection using its own ForAll loop.
I would dearly dearly appreciate advice on this! And of course, I'm happy to offer clarifications where needed.
-=-=-=-=-
Larger perspective: Right now, our techs organize incoming physical boxes across different physical shelves, and each day they move sections of boxes and perform different comms based on shelf. For example, Shelf #3 gets Reminders sent, and then all items are moved to Shelf #4 for the next day. But as our production system ramps up, more boxes arrive each day. So instead of having the technicians determine which comm to send based on which shelf, and moving dozens of boxes each day, I just want them to scan the entire room and then let the tool look into the database and decide what is the next most appropriate comm to send. This will save them about 20-40 minutes a day. Furthermore, in the near future, once this system is working well, I plan to centralize comms from a single technician, instead of each tech at each remote location performing this function. But later, later...

Access Query & Email assistance

I'm decent when it comes to working with Access, and can usually figure things out but am currently stumped on this query process.
I only have 2 shipment tables. One contains historical data (tblBookedLoads), while the second table (RMCData) contains current orders.
What I need to do, is for each row in RMCData, figure out who has driven the same route recently (Origin to Destination) and send them an email with the details of the matching shipments... and any others that match, IE, if I have 50 shipments from Atlanta to Memphis tomorrow, I don't want to send them 50 emails.
I'm just not sure about how to go about retrieving the information since it it stored in different formats, or how to create a query based on row by row data from fields in another table... Let me know what you think. I'm guessing it has something to do with looping in vba, but not sure.
Both tables are read only.
tblRMCData 'Contains details for new orders (approx. 1000 rows)
' has fields 'ReferenceNum','Ship Date','Origin City', 'Origin State','Origin Zip','destination', 'Weight','Comments","Comments2"
'Origin/Destination City, State and Zip are separated
tblBookedLoads 'contains history of orders (approx. 20000 rows). CONTAINS DUPLICATES
' has fields 'Carrier', 'Ship Date", 'Order','Origin', 'Destination','Weight','Bill Amount",'email address'
' Origin/Destination is populated as "LITTLE ROCK, AR 12345"
Any advise is appreciated.

How to sum up the records on an Object with respect to a Month

Hello there Salesforce experts,
I have an Object Monthly_cc__c where there are records flowing in for each country every month.
Below are the fields
ID is the record ID
Processing Date
Processing Year
Processing Month
Distributor__c is the Master-Detal relation ship with Account.
Operating Company (IND, AUS, GBR...)
Now for every record there are the below fields
Personal CC
Monthly CC
For each Operating company these results are being sent and for the same Processing date we have various records for different Operating companies.
I would like to sum up the Personal cc for January for us to not complicate our SQL query.
Please let us know a solution on how to accomplish this task.
Thank you.
The expected result would be
For January, 2017
Personal CC = sum(AUS,IND,GBR'S Personal CC)
Monthly CC = Sum(AUS,IND,GBR'S Monthly CC)
You have lots of options here and final result also depends on whether your company uses currency management (is CurrencyIsoCode field present on all objects?).
This should be what you're asking for:
SELECT Distributor__c, SUM(Personal_CC__c) personal, SUM(Monthly_CC__c) monthly
FROM Monthly_CC__c
WHERE Processing_Year__c = 2017 AND Processing_Month__c = 1
GROUP BY Distributor__c
LIMIT 200
This should be bit more flexible:
SELECT Distributor__c, Processing_Year__c, Processing_Month__c, SUM(Personal_CC__c) personal, SUM(Monthly_CC__c) monthly
FROM Monthly_CC__c
GROUP BY Distributor__c, Processing_Year__c, Processing_Month__c
LIMIT 200
Read up about SOQL, GROUP BY & aggregate functions, GROUP BY ROLLUP().
Your data looks to be bit artificially split into day-month-year. If you have real date fields there are ways to GROUP BY CALENDAR_MONTH().
And last but not least - if this query results will be processed in Apex - you need to read about AggregateResult.
P.S. You should work on your questions. We're not writing code for you. What have you tried, what part you're stuck on? If you're completely new to salesforce and don't even know what to experiment with - start with SQOL-related Trailheads

How to store sets of objects that have occurred together during events?

I'm looking for an efficient way of storing sets of objects that have occurred together during events, in such a way that I can generate aggregate stats on them on a day-by-day basis.
To make up an example, let's imagine a system that keeps track of meetings in an office. For every meeting we record how many minutes long it was and in which room it took place.
I want to get stats broken down both by person as well as by room. I do not need to keep track of the individual meetings (so no meeting_id or anything like that), all I want to know is daily aggregate information. In my real application there are hundreds of thousands of events per day so storing each one individually is not feasible.
I'd like to be able to answer questions like:
In 2012, how many minutes did Bob, Sam, and Julie spend in each conference room (not necessarily together)?
Probably fine to do this with 3 queries:
>>> query(dates=2012, people=[Bob])
{Board-Room: 35, Auditorium: 279}
>>> query(dates=2012, people=[Sam])
{Board-Room: 790, Auditorium: 277, Broom-Closet: 71}
>>> query(dates=2012, people=[Julie])
{Board-Room: 190, Broom-Closet: 55}
In 2012, how many minutes did Sam and Julie spend MEETING TOGETHER in each conference room? What about Bob, Sam, and Julie all together?
>>> query(dates=2012, people=[Sam, Julie])
{Board-Room: 128, Broom-Closet: 55}
>>> query(dates=2012, people=[Bob, Sam, Julie])
{Board-Room: 22}
In 2012, how many minutes did each person spend in the Board-Room?
>>> query(dates=2012, rooms=[Board-Room])
{Bob: 35, Sam: 790, Julie: 190}
In 2012, how many minutes was the Board-Room in use?
This is actually pretty difficult since the naive strategy of summing up the number of minutes each person spent will result in serious over-counting. But we can probably solve this by storing the number separately as the meta-person Anyone:
>>> query(dates=2012, rooms=[Board-Room], people=[Anyone])
865
What are some good data structures or databases that I can use to enable this kind of querying? Since the rest of my application uses MySQL, I'm tempted to define a string column that holds the (sorted) ids of each person in the meeting, but the size of this table will grow pretty quickly:
2012-01-01 | "Bob" | "Board-Room" | 2
2012-01-01 | "Julie" | "Board-Room" | 4
2012-01-01 | "Sam" | "Board-Room" | 6
2012-01-01 | "Bob,Julie" | "Board-Room" | 2
2012-01-01 | "Bob,Sam" | "Board-Room" | 2
2012-01-01 | "Julie,Sam" | "Board-Room" | 3
2012-01-01 | "Bob,Julie,Sam" | "Board-Room" | 2
2012-01-01 | "Anyone" | "Board-Room" | 7
What else can I do?
Your question is a little unclear because you say you don't want to store each individual meeting, but then how are you getting the current meeting stats (dates)? In addition any table given the right indexes can be very fast even with alot of records.
You should be able to use a table like log_meeting. I imagine it could contain something like:
employee_id, room_id, date (as timestamp), time_in_meeting
Where foreign keys to employee id to employee table, and room id key to room table
If you index employee id, room id, and date you should have a pretty quick lookup as mysql multiple-column indexes go left to right such that you gain index on (employee id, employee id + room id, and employee id + room id + timestamp) when do searches. This is explained more in the multi-index part of:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
By refusing to store meetings (and related objects) individually, you are loosing the original source of information.
You will not be able to compensate for this loss of data, unless you memorize on a regular basis the extensive list of all potential daily (or monthly or weekly or ...) aggregates that you might need to question later on!
Believe me, it's going to be a nightmare ...
If the number of people are constant and not very large you can then assign a column to each person for present or not and store the room, date and time in 3 more columns this can remove the string splitting problems.
Also by the nature of your question I feel first of all you need to assign Ids to everything rooms,people, etc. No need for long repetitive string in DB. Also try reducing any string operation and work using individual data in each column for better intersection performance. Also you can store a permutation all the people in a table and assign a id for them then use one of those ids in the actual date and time table. But all techniques will require that something be constant either people or rooms.
I do not understand whether you know all "questions" in design time or it's possible to add new ones during development/production time - this approach would require to keep all data all the time.
Well if you would know all your questions it seems like classic "banking system" which recalculates data on daily basis.
How I think about it.
Seems like you have limited number of rooms, people, days etc.
Gather logging data on daily basis, one table per day. Just one event, one database row, all information (field) what you need.
Start to analyse data using some crone script at "midnight".
Update stats for people, rooms, etc. Just increment number of hours spent by Bob in xyz room etc. All what your requirements need.
As analyzed data are limited and relatively small as you analyzed (compress) them, your system can contain also various queries as indexes would be relatively small etc.
You could be able to use scalable map/reduce algorithm.
You can't avoid storing the atomic facts as follows: (the meeting room, the people, the duration, the day), which is probably only a weak consolidation when the same people meet multiple times in the same room on the same day. Maybe that happens a lot in your office :).
Making groups comparable is an interesting problem, but as long as you always compose the member strings the same, you can probably do it with string comparisons. This is not "normal" however. To normalise you'll need a relation table (many to many) and compose a temporary table out of your query set so it joins quickly, or use an "IN" clause and a count aggregate to ensure everyone is there (you'll see what I mean when you try it).
I think you can derive the minutes the board room was in use as meetings shouldn't overlap, so a sum will work.
For storage efficiency, use integer keys for everything with lookup tables. Dereference the integers during the query parsing, or just use good old joins if you are feeling traditional.
That's how I would do it anyway :).
You'll probably have to store individual meetings to get the data you need anyway.
However you'll have to make sure you aggregate and anonymise it properly before creating your reports. Make sure to separate concerns and access levels to stay within the proper legal limits on data.

How do I get around the Sum(First(...)) not allowed limitation is SSRS2005

The problem that I have is SQL Server Reporting Services does not like Sum(First()) notation. It will only allow either Sum() or First().
The Context
I am creating a reconciliation report. ie. what sock we had a the start of a period, what was ordered and what stock we had at the end.
Dataset returns something like
Type,Product,Customer,Stock at Start(SAS), Ordered Qty, Stock At End (SAE)
Export,1,1,100,5,90
Export,1,2,100,5,90
Domestic,2,1,200,10,150
Domestic,2,2,200,20,150
Domestic,2,3,200,30,150
I group by Type, then Product and list the customers that bought that product.
I want to display the total for SAS, Ordered Qty, and SAE but if I do a Sum on the SAS or SAE I get a value of 200 and 600 for Product 1 and 2 respectively when it should have been 100 and 200 respectively.
I thought that i could do a Sum(First()) But SSRS complains that I can not have an aggregate within an aggregate.
Ideally SSRS needs a Sum(Distinct())
Solutions So Far
1. Don't show the Stock at Start and Stock At End as part of the totals.
2. Write some code directly in the report to do the calc. tried this one - didn't work as I expected.
3. Write an assembly to do the calculation. (Have not tried this one)
Edit - Problem clarification
The problem stems from the fact that this is actually two reports merged into one (as I see it). A Production Report and a sales report.
The report tried to address these criteria
the market that we sold it to (export, domestic)
how much did we have in stock,
how much was produced,
how much was sold,
who did we sell it to,
how much do we have left over.
The complicating factor is the who did we sell it to. with out that, it would have been relativly easy. But including it means that the other top line figures (stock at start and stock at end) have nothing to do with the what is sold, other than the particular product.
I had a similar issue and ended up using ROW_NUMBER in my query to provide a integer for the row value and then using SUM(IIF(myRowNumber = 1, myValue, 0)).
I'll edit this when I get to work and provide more data, but thought this might be enough to get you started. I'm curious about Adolf's solution too.
Pooh! Where's my peg?!
Have you thought about using windowing/ranking functions in the SQL for this?
This allows you to aggregate data without losing detail
e.g. Imagine for a range of values, you want the Min and Max returning, but you also wish to return the initial data (no summary of data).
Group Value Min Max
A 3 2 9
A 7 2 9
A 9 2 9
A 2 2 9
B 5 5 7
B 7 5 7
C etc..
Syntax looks odd but its just
AggregateFunctionYouWant OVER (WhatYouWantItGroupedBy, WhatYouWantItOrderedBy) as AggVal
Windowing
Ranking
you're dataset is a little weird but i think i understand where you're going.
try making the dataset return in this order:
Type, Product, SAS, SAE, Customer, Ordered Qty
what i would do is create a report with a table control. i would set up the type, product, and customer as three separate groups. i would put the sas and sae data on the same group as the product, and the quantity on the customer group. this should resemble what i believe you are trying to go for. your sas and sae should be in a first()
Write a subquery.
Ideally SSRS needs a Sum(Distinct())
Re-write your query to do this correctly.
I suspect your problem is that you're written a query that gets you the wrong results, or you have poorly designed tables. Without knowing more about what you're trying to do, I can't tell you how to fix it, but it has a bad "smell".

Resources