Adding a Unique Index When Duplicate Values Exist in the Column - salesforce

I have a problem with creating a unique existing field.
Standard Object - Account;
Field - EMail (several instances of the Account object with the same mail have already been created).
I want to make the field unique (do not repeat the values), how can I solve the problem with the existing data correctly? If through the method "point and click" to act, then an error is generated:
Error: Duplicate value (s) found when building unique index, example: blabla # gmail.com on rows

You'll have to delete the records (or maybe just clear the email fields on some?) before you can apply unique index. Salesforce has duplicate rules, you could use them to find matches, maybe the answer is to use merge operation, not delete? You can merge manually or with Apex. Or just run a report on Accounts grouped by email, sort by count descending and see what needs fixing.
If that bad data has to stay as is - best would be to configure duplicate rules to look at the field and prevent creating new? In this trailhead 2nd module has some screenshots that show how you can configure it to BLOCK inserts & updates. Not as good as true unique field but it's something...
Worst case you could cheat a bit. I mean depending on address if it's gmail then googlemail.com works, putting dots in the part before #... Won't work with all addresses though.

Related

Design scenario of a DynamoDB table

I am new to DynamoDB and after reading several docs, there is a scenario in which I am not sure which would be the best approach for designing a table.
Consider that we have some JobOffers and we should support the following data access:
get a specific JobOffer
get all JobOffers from a specific Company sorted by different criteria (newest/oldest/wage)
get JobOffers from a specific Company filtered by a specific city sorted by different criteria (newest/oldest/wage)
get all JobOffers (regardless of any Company !!!) sorted by different criteria (newest/oldest/wage)
get JobOffers (regardless of any Company !!!) filtered by a specific city sorted by different criteria (newest/oldest/wage)
Since we need to support sorting, my understanding is that we should use Query instead of Scan. In order to use Query, we need to use a primary key. Because we need to support a search like "get all JobOffers without any filters sorted somehow", which would be a good candidate for partition key?
As a workaround, I was thinking to use a new attribute "country" which can be used as the partition key, but since all JobOffers are specified in one country, all these items fall under the same partition, so it might be a bit redundant until we will have support for JobOffers from different countries.
Any suggestion on how to design JobOffer table (with PK and GSI/LSI) for such a scenario?
Design of a Dynamodb table is best done with an Access approach - that is - how are you going to be accessing the data in here. You have information X, you need Y.
Also remember that a dynamo is NOT an sql, and it is not restricted that every item has to be the same - consider each entry a document, with its PK/SK as directory/item structure in a file system almost.
So for example:
You have user data. You have things like : Avatar data (image name, image size, image type) Login data (salt/pepper hashes, email address, username), Post history (post title, identifier, content, replies). Each user will only have 1 Avatar item and 1 Login item, but have many Post items
You know that from the user page you are always going to have the user ID. 100% of the time. This should then be your PK - your Hash Key, PartitionKey. Then you have the rest of the things you need inform your sort key/range key.
PK
USER#123456
SK:
AVATAR - Attributes: (image name, image size, image type)
PK
USER#123456
SK:
LOGIN - Attributes: (salt/pepper hashes, email address, username)
PK
USER#123456
SK:
POST#123 - Attributes: (post title, identifier, content, replies)
PK
USER#123456
SK:
POST#125 - Attributes: (post title, identifier, content, replies)
PK
USER#123456
SK:
POST#193 - Attributes: (post title, identifier, content, replies)
This way you can do a query with the User ID and get ALL the user data back. Or if you are on a page that just displays their post history, you can do a query against User ID # SK Begins With POST and get back all t heir posts.
You can put in an inverted index (SK -> PK and vice versa) and then do a query on POST#193 and get back the user ID. Or if you have other PK types with POST#193 as the SK, you get more information there (like a REPLIES#193 PK or something)
The idea here is that you have X bits of information, and you need to craft your dynamo to be able to retrieve as much as possible with just that information, and using prefix's on your SKs you can then narrow the fields a little.
Note!
Sometimes this means a duplication of information! That you may have the same information under two sets of keys. This is ok and kind of expected when you start getting into really complex relationships. You can mitigate it somewhat with index's, but you should aim to avoid them where possible as they do introduce a bit of lag in terms of data propagation (its tiny, but it can add up)
So you have your list of things you want to get for your dynamo. What will you always have to tie them together? What piece of data do you have that will work?
You can do the first 3 with a company PK identifier and a reverse index. That will let you look up and get all a companies jobs, or using the reverse index all a specific job. Or if you can always know the company when looking up a specific job, then it uses the general first index.
Company# - Job# - data data data
You then do the sorting on your own, OR you add some sort of sort valuye to the Job# key - Sort Keys are inherently sorted after all. Company# - Job#1234#UNITED_STATES
of course this will only work for one sort at a time. You can make more than one index, but again - data sync lag is a real possibility.
But how to do this regardless of Company? Well you can have another index with your searchable attribute (Country for example) as the PK then you can query that.
Or do you have another set of data that can tie this all together? Do you have another thing that can reach it all?
If not, you may just have two items in your dynamo:
Company#1234 - Job#321 - details
Company#1234 - Country#United_states - job#321, job#456, job#1234
Company#1234 - Country#England - job#992, job#123, job#19231
your reverse index here would apply - you could do a query on PK: Contry#UnitedStates and you'd get back:
Country#United_states - Company#1234 - job#321, job #456, job31234
Country#United_states - Company#4556
Country#United_States - Comapny#8322
this isnt a relational database however! So either you have to do one of two things - use t hose job#s to then query that company and get the filter the jobs by what you want (bad - trying to avoid multiple queries!) or each job# is an attribute on country sk's, and it contains a copy of that relevant data in a map format {job title, job#, country, company, salary}. Then when they click on that job to go to the details, it makes a direct call straight to the job query, gets the details to display,and its good.
Again, it all comes down to access patterns. What do you have, and how can you arrange it in a way that lets you get what you need fast

Amazon DynamoDB Single Table Design For Blog Application

New to this community. I need some help in designing the Amazon Dynamo DB table for my personal projects.
Overview, this is a simple photo gallery application with following attributes.
UserID
PostID
List item
S3URL
Caption
Likes
Reports
UploadTime
I wish to perform the following queries:
For a given user, fetch 'N' most recent posts
For a given user, fetch 'N' most liked posts
Give 'N' most recent posts (Newsfeed)
Give 'N' most liked posts (Newsfeed)
My solution:
Keeping UserID as the partition key, PostID as the sort key, likes and UploadTime as the local secondary index, I can solve the first two query.
I'm confused on how to perform query operation for 3 and 4 (Newsfeed). I know without partition ket I cannot query and scan is not an effective solution. Any workaround for operatoin 3 and 4 ?
Any idea on how should I design my DB ?
It looks like you're off to a great start with your current design, well done!
For access pattern #3, you want to fetch the most recent posts. One way to approach this is to create a global secondary index (GSI) to aggregate posts by their creation time. For example, you could create a variable named GSI1PK on your main table and assign it a value of POSTS and use the upload_time field as the sort key. That would look something like this:
Viewing the secondary index (I've named it GSI1), your data would look like this:
This would allow you to query for Posts and sort by upload_time. This is a great start. However, your POSTS partition will grow quite large over time. Instead of choosing POSTS as the partition key for your secondary index, consider using a truncated timestamp to group posts by date. For example, here's how you could store posts by the month they were created:
Storing posts using a truncated timestamp will help you distribute your data across partitions, which will help your DB scale. If a month is too long, you could use truncated timestamps for a week/day/hour/etc. Whatever makes sense.
To fetch the N most recent posts, you'd simply query your secondary index for POSTS in the current month (e.g. POSTS#2021-01-00). If you don't get enough results, run the same query against the prior month (e.g. POSTS#2020-12-00). Keep doing this until your application has enough posts to show the client.
For the fourth access pattern, you'd like to fetch the most liked posts. One way to implement this access pattern is to define another GSI with "LIKES" as the partition key and the number of likes as the sort key.
If you intend on introducing a data range on the number of likes (e.g. most popular posts this week/month/year/etc) you could utilize the truncated timestamp approach I outlined for the previous access pattern.
When you find yourself "fetch most recent" access patterns, you may want to check out KSUIDs. KSUIDs, or K-sortable Universal Identifier, are unique identifiers that are sortable by their creation date/time/. Think of them as UUID's and timestamps combined into one attribute. This could be useful in supporting your first access pattern where you are fetching most recent posts for a user. If you were to use a KSUID for the Post ID, your table would look like this:
I've replaced the POST ID's in this example with KSUIDs. Because the KSUIDs are unique and sortable by the time they were created, you are able to support your first access pattern without any additional indexing.
There are KSUID libraries for most popular programming languages, so implementing this feature is pretty simple.
You could add two Global Secondary Indexes.
For 3):
Create a static attribute type with the value post, which serves as the Partition Key for the GSI and use the attribute UploadTime as the Sort Key. You can then query for type="post" and get the most recent items based on the sort key.
The solution for 4) is very similar:
Create another Global secondary index with the aforementioned item type as the partition key and Likes as the sort key. You can then query in a similar way as above. Note, that GSIs are eventually consistent, so it may take time until your like counters are updated.
Explanation and additional infos
Using this approach you group all posts in a single item collection, which allows for efficient queries. To save on storage space and RCUs, you can also choose to only project a subset of attributes into the index.
If you have more than 10GB of post-data, this design isn't ideal, but for a smaller application it will work fine.
If you're going for a Single Table Design, I'd recommend to use generic names for the Index attributes: PK, SK, GSI1PK, GSI1SK, GSI2PK, GSI2SK. You can then duplicate the attribute values into these items. This will make it less confusing if you store different entities in the table. Adding a type column that holds the entity type is also common.

Grab value from unrelated object where two fields match

I have two objects - "Account" and "Appointment". I'm trying to pull the value of the field "Status" from the "Appointment" object where "Account.Initial_Date" matches "Appointment.Date_Time". I initially tried making a new field in the "Account" object to return a text field and see if maybe it would return the first value:
Appointment__c.Status__c
Which results in the error:
"Field Appointment__c does not exist. Check spelling."
I was told that it's too difficult to link from "Appointment" to "Account" because there can be multiple appointments per account, which is why I'm trying to link based on the date fields. My next attempt was using VLOOKUP, but I read that this only works between custom objects, and I think I'm working with standard objects here... what kind of solution should I be looking for?
Adding the tag apex here in case this can only be achieved via a script of some sort - if that's the case, I'll make attempt via that.
I was told that it's too difficult to link from "Appointment" to "Account" because there can be multiple appointments per account
This is incorrect. That relationship appears to be exactly the same as that between Contact and Account - one Contact, many Accounts. It's a very common relationship pattern in Salesforce.
If an Appointment is logically related to an Account, it should have a relationship field referencing the Account object to which it is related.
However, having a one-to-many relationship does not mean you can trivially represent specific data points from the many side to the one side. The native tool to do so is the Roll-Up Summary Field, but it does not apply to your use case.
There's really three ways to implement your objective, which is essentially implementing a variant of a roll-up summary. VLOOKUP(), which works only in Validation Rules, does not apply here.
Write two Apex triggers (one on Account and one on Appointment) to react to all changes that would influence what value should appear in the Account__c.Status__c field.
Write equivalent Process and Flow declarative automation, which cannot get 100% of the way there because Process Builder and Flow cannot react to delete events.
Use the free and open source Declarative Lookup Rollup Summaries application to define a roll-up summary. DLRS can populate a field from the child object (Appointment) to the parent (Account) based on a sorting by another field (Date_Time__c).

Cannot insert duplicate key row in object with unique index . The duplicate key value. The statement has been terminated

I am new to Outsystems and SQL. I am try to create a Bus Application where the entities are
When I try to create a new rider with the same name and different Route and bus Id. I get
Cannot insert duplicate key row in object 'dbo.OSUSR_6SL_RIDER' with unique index 'OSIDX_OSUSR_6SL_RIDER_4NAME'. The duplicate key value is (ABC).
The statement has been terminated.
When I check Name field in the database table 'dbo.OSUSR_6SL_RIDER' it is not having the unique identifier set up. Can anybody please help me with this.
Open the Indexes tree under your table. You will find an Index named 'OSIDX_OSUSR_6SL_RIDER_4NAME'.
Script out that Index and you will see that it is a UNIQUE index on a "name" column that you are trying to create a duplicate value in.
You must either change that Index to include Route and Bus ID, or you must abandon your attempt to create a new row with a duplicate name.
It looks like you are are creating an exact duplicate, i.e. a record with the same Id value. The index name it refers to seems to be auto generated by the db system. Therefor it is not necessarily referring to the Name field. Have a look at your indexes and look at the fields they contain. I wouldn't be surprised if OSIDX_OSUSR_6SL_RIDER_4NAME contains the Id field.
If you are using the OutSystems platform, all the database management is done/generated when you publish from Service Studio, so it isn’t advisable to manipulate the database directly: you’re setting yourself up for a lot of maintenance pain and inconsistencies between different environments.
Double-click on the Entity Rider and it’ll open the edit window of your entity. In the Indexes tab you can define and change your indexes (unique or not) and the tool will (re)generate all the needed SQL commands.
See OutSystems Platform 9 Help | Indexes Tab for more details:

best solution for hirercical and non typed data

I have a db schema like this:
ELEMENT(uuid[string], name[string], status[integer], ...)
APPLICATION(uuid[string], name[string], config[string], status[integer], parent[foreign key on self or foreign key on ELEMENT], ...)
ATTRIBUTE(uuid[string], name[string],type[string], parent[foreign key on APPLICATION], ...)
VALUES(created[datetime], data[varchar|text|integer|float|boolean|datetime])
And I have this constraints:
Application must have a fk on ELEMENT or on SELF
Attribute could grow up to 250000 values
For every Attribute I have 20.000 Values
Values data column can have different types
On user request I need to serve all Application that match a specific Element
On user request I need to serve all Attributes and their last value that match a specific application
On user request I need to server all Values of a specific Attribute in a arbitrary datetime range
At regular intervals(x minutes) new values are added and the oldest are removed
When new values are added I might need to access last X values of an Attribute
when someone is modifying a item of ELEMENT, APPLICATION or ATTRIBUTE i need to lock that item
I need a persistent data storage and good performance when users makes some requests
I cannot use a classic RDBMS because of constraint 4 and i think a relation database is not the best solution in any case. I think best solution is to use a NOSQL database but which one? There are lot of solutions with pro and cons but i don't know which approaches better to my needs.
Thanks

Resources