Find out the popularity of values - sql-server

I have a table with 1000 rows. Each row represents a prompt text for an application. For the start I only want to translate the most used 20% of the promts. In the daily use some dialogs appear more often than others. So the prompt texts for the most displayed dialogs get fetched more often than the others.
However, it looks to me like there is no built-in mechanism to analyse the data by their select rates.
There are no triggers on select. There is no way to filter the data in the profiler. There is no way to filter data in an Audit. Is that true?
Are there any options to do that inside the SQL Server?

No. There is no way to track the frequency of how often data is selected.
This sounds like application metrics. You will have to write metrics logic yourself.
For example, you might create a table of MissingTranslations that tracks the frequency of requests. If your application detects a missing translation, insert a row into this table with a frequency of 1, or increment the counter if it already exists in the table.
You could then write another application that sorts the missing translations by frequency descending. When a user enters the translation, the translation app removes the entry from the list of missing translations or marks it as complete.
All that being said, you could abuse some SQL Server features to get some information. For example, a stored procedure that returns these translations could generate a user-configurable trace event with the translation info. A SQL Profiler session could listen for these events and write them to a table. This would get you a basic frequency.
It might be possible to get the same information from implementing auditing and then calling sys.fn_get_audit_file, but that sounds cumbersome at best.
In my opinion, it sounds easier and more stable to me to write this logic yourself.

#TabAlleman: "no, there's nothing you can do"

Related

Saleforce Admin Sharing Rule

[Note: There is a Teacher Object with the fields such as Teacher Name, DateofJoining, and also a formula field called Experience]
My Task was to create a Public Group consisting of another user
and this user should only see teachers who have experience greater than 2 years
But when i create a sharing rule based on criteria the field name called Experience doesn't show up as it is a formula field.
So i got an idea of creating a new field(maybe a text or number data type) which would have the value of Experience in it. (But i have no idea on how to implement this)
Is there a way to implement this?
Any other solution is also well appreciated!
Hard to say.
Normal trick would be to create a helper field (text, number, whatever) and have piece of functionality that populates it. An "early flow" or "before insert, before update" trigger ideally. Worst case a normal flow, process builder or "after insert, after update" trigger. Something like "if Experience__c != 'your formula here' then Experience__c = 'your formula here'". Consult normal SF help and trailhead if you never used early flows
You'd make an one-off data fix to populate existing records and job done, normal field should be selectable as sharing rule criteria.
=====
But I smell trouble with your formula. What exactly you have there, something like Experience__c = (TODAY() - DateofJoining__c) / 365? That's bit evil. Formulas with TODAY(), NOW() or anything with $ (roughly speaking who's looking at the data, user's name, profile role... not what's actually on the record itself) are "nondeterministic". Unpredictable.
A "today()" changes just like that, without updating the record. Sure, when you watch the record a fresh value will be calculated but other than that LastModifiedDate doesn't change, there's no magical trigger running at midnight that rechecks sharing. (especially that there's no single midnight, you could have users in multiple timezones). SF just doesn't allow nondeterministic fields in many places, see https://salesforce.stackexchange.com/q/32122/799
So if you do rely on TODAY() in your formula you might have to make a "scheduled flow" or read about schedulable, batchable apex. Create nightly job that would run and recalculate your helper field with right experience. You'd probably even need both solutions, a "before save" flow for new data created today and nightly job to advance the clock on existing old data...

If I am having any random accountId then How to find its ultimate parent account - looking for best optimized solution (for multiple level hierarchy)

If I am having any random accountId then How to find its ultimate parent account - looking for best-optimized solution (for multiple level hierarchy)
except 10 levels of formula field solution
It depends. Optimized for what, read operations (instant simple answer when querying) or write (easy save but more work when reading).
If you want easy read - you need to put some effort when saving the data. And remember you can't get away with a simple custom lookup called "Ultimate Parent" - because for standalone account SF will not let you form a cycle, create record that looks up itself. You might need 2 text fields (Id and Name) or some convention that yes, you'll make a lookup to Account - but if it's blank - the reading process needs to check the ParentId field too to determine what exactly is going on. (you could make a formula field to simplify reading but still - don't think you're getting away with simple lookup)
How much data you have, how deep hierarchies? The basic answer is to keep track of ultimate parent on every insert, update, delete and undelete. Write a trigger, SOQL query can go "up" 5 "dots" max
SELECT ParentId,
Parent.ParentId,
Parent.Parent.ParentId,
Parent.Parent.Parent.ParentId,
Parent.Parent.Parent.Parent.ParentId,
Parent.Parent.Parent.Parent.Parent.ParentId
FROM Account
WHERE Id IN :trigger.new
It gets messier if you need multiple queries (but still, this form would be most effective). And also you might hit performance issues when something reparents close to top of the tree and you're suddenly looking at having to cascade update hundreds of accounts. Remember you have a limit of 10K rows inserted/updated/deleted in single operation. You might have to propagate the changes down as a batch/future/queueable async process.
Another option would be to have a flat helper object aside from account table, with unique id set to account id. Have a background process periodically refreshing that table, even every hour. Using a batch job or reporting snapshot. Still not great if you have milions of accounts, waste of storage... but maybe you could use Big Objects.
Have you ever used platform cache? If the ultimate parent has to be fetched via apex (instead of being a real field on Account) - you could try to make some kind of "linked list" implementation where you store Id -> ParentId in cache and can travel it without wasting any queries. Cache's max is 48h (so might still need a nightly job to rebuild it) and you'd still have to update it on every insert/update/delete/undelete...
So yeah, "it depends". Write more about your requirement.

Saleforce SOQL query - Jersey Readtimeout error

I'm having a problem on a batch job that has a simple SOQL query that returns a lot of records. More than a million.
The query, as it is, cannot be optimized much further according to SOQL best practices. (At least, as far as I know. I'm not an SF SOQL expert.)
The problem is that I'm getting -
Caused by: javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: Read timed out
I try bumping up the Jersey readtime out value from 30 seconds to 60 seconds, but it still times out.
Any recommendation on how to deal with this issue? Any recommended value for the readtimeout parameter for a query that returns that much data?
The query is like this:
SELECT Id, field1, field2__c, field3__c, field3__c FROM Object__c
WHERE field2__c = true AND (not field3 like '\u0025Some string\u0025')
ORDER BY field4__c ASC
In no specific order...
Batches written in Apex time out after 2 minutes so maybe set same in your Java application
Run your query in Developer Console using the query plan feature (you probably will have to put real % in there, not \u0025). Pay attention which part has "Cost" column > 1.
what are field types? Plain checkbox and text or some complex formulas?
Is that text static or changes depending on what your app needs? would you consider filtering out the string in your code rather than SOQL? Counter-intuitive to return more records than you really need but well, might be an option.
would you consider making a formula field with either whole logic or just the string search and then asking SF to index the formula. Or maybe making another field (another checkbox?) with "yes, it contains that text" info, set the value by workflow maybe (essentially prepare your data a bit to efficiently query it later)
read up about skinny tables and see if it's something that could work for you (needs SF support)
can you make an analytic snapshot of your data (make a report, make SF save results to helper object, query that object)? Even if it'd just contain lookups to your original source so you'll access always fresh values it could help. Might be a storage killer though
have you considered "big objects" and async soql
I'm not proud of it but in the past I had some success badgering the SF database. Not via API but if I had a nightly batch job that was timing out I kept resubmitting it and eventually 3rd-5th time it managed to start. Something in the query optimizer, creation of cursor in underlying Oracle database, caching partial results... I don't know.
what's in the ORDER BY? Some date field? If you need records updated since X first then maybe replication API could help getting ids first.
does it make sense to use LIMIT 200 for example? Which API you're using, SOAP or REST? Might be that returning smaller chunks (SOAP: batch size, REST API: special header) would help it finish faster.
when all else fails (but do contact SF support, make sure you exhausted the options) maybe restructure the whole thing. Make SF push data to you whenever it changes, not pull. There's "Streaming API" (CometD implementation, Bayeux protocol, however these are called) and "Change Data Capture" and "Platform Events" for nice event bus-driven architecture decisions, replaying old events up to 3 days back if the client was down and couldn't listen... But that's a totally different topic.

Data design: What's the best way to store tasks generated by a scheduler?

this is a data design problem I am facing right now.
So I am currently designing a scheduling application for a farm. To deal with orders coming in, I've designed the following class called Forecast. Forecast has the following attributes:
Due_date
Entries - This is a list of {crop:X, quantity:Y} which indicates how much of each crop is required for each of these crop.
As you know, each crop has different durations for:
Propagation Duration (from seed to planting)
Spacing Duration (from planted to more spaced out for maximum sunlight)
Storage Duration (from harvest to expiry)
I wrote a "backward scheduler" which figures out when an order has to be met, then goes backwards and creates task for what needs to be done on what date.
Note that these calendar "entries" are not stored in the database - I generate them on the fly and am planning to allow my users to drag these events forward and backward depending on their preference.
Now, I am tasked with generating reports for each day on what needs to be done. My users would like to be able to generate a list of tasks for a particular date. (So if you look at my screenshot May 1st would result in one tasks being created, 'transplanting 12 units of Minutina').
I am not sure what's the best way to go about accomplishing this. My initial design is:
Create a table for tasks. This will include: (forecast_id, forecast_updated_at)
Store the tasks generated by my scheduler into this table.
When a user modifies a task, the database entry will be updated correspondingly.
However, if the user has decided to modify the original Forecast, then we will have to delete all the tasks associated with this forecast and regenerate a new sets of tasks and store them again in the table mentioned above.
The purpose of the (forecast_id, forecast_updated_at) is to allow me to quickly verify the correctness of the tasks currently in my database.
This design is a little messy and I would like to hear from any experts out there for a better approach to this problem.

There is probably a name for this. Please re-title appropriately

I'm evaluating the idea of building a set of generic database tables that will persist user input. There will then be a secondary process to kick off a workflow and process the input.
The idea is that the notion of saving the initial user input is separate from processing and putting it into the structured schema for a particular application.
An example might be some sort of job application or quiz with open-ended questions. The raw answers will not be super valuable to us for aggregate reporting without some human classification. But, we do want to store the raw input as a historical record.
We may also want the user to be able to partially fill out some information and have it persisted until he returns.
Processing all the input to the point where we can put it into our application-specific data schema may not be possible until we have ALL the data.
Two initial questions:
Assuming this concept has a name, what is it?
Is this a reasonable approach? Why or why not?
Update:
Here's another way to state the idea. The user is sequentially populating fields in a DTO. I (think I) want to save the DTO to disk even in a partially-complete state. Once the user has completed populating the fields, I want to pull out the DTO and process it for structured saving into a table which represents the specific DTO. I can't, however, save a partially complete or (worse) a temporarily incorrect set of input since some of the input really shouldn't be stored as part of the structured record.
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed. So maybe this generic DTO table stores data relating to customer satisfaction surveys right next to questions answered in a new account setup wizard.
You stated:
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed.
I think you're one level-of-abstration off. I would argue that the entire database is fulfilling the role you want a limited set of tables to perform. You could create some kind of complicated storage schema that wouldn't represent the data in any way, and then (slowly and painfully, from the DBMS's perspective) merge and render a view of the data ... but I would suggest that this is an over-engineered solution.
I've written several applications where, because of custom user requirements, a (sometimes significant) portion of the application is dynamic - constructed by the user, from the schema to the business rules. The ones that manufactured their storage schemas by executing statements like CREATE TABLE and ALTER TABLE were, surprisingly, the ones easiest to maintain. They also allow users to create reports in a very straightforward, expected way.
Sounds like you're initially storing the data in a normalized form(generic), and once you have the complete set you are denormalizing it(structured schema).
You might be speaking about Workflow. You might want to check out Windows Workflow.
The concepts of Workflow are that they mirror the processes of real life. That is to say, you make complete a document, but the document is not complete until it has been approved. In your case, that would be 'Data is entered' but unclassified, so it is stored in the database (dehydrated) and a flag is sent up for whoever needs to deal with the issue. It can persist in this state for as long as necessary. Once someone is able to deal with it, the workflow is kicked off again (hydrated) and continues to the next steps.
Here are some SO questions regarding workflows:
This question: "Is it better to have one big workflow or several smaller specific ones?" clears up some of the ways that workflow can be used, and also highlights some issues with it.
John Saunders has a very good breakdown of what workflow is good for in this question.

Resources