Query two custom objects joining on the Name field - salesforce

I want to create a join on two custom objects joining on the Name field. Normally joins require a lookup or master-detail relationship between the two objects, but I just want to do a text match.
I think this is a Salesforce limitation but I couldn't find any docs on whether this was so. Can anyone confirm this?

Yes, you can make a join (with dot notation or as subquery) only if there's a relationship present. And relationships (lookup or master-detail) can be made only by Id. There are several "mutant fields" (like Task.WhoId) but generally speaking you can't write a JOIN in SOQL and certainly can't use a text column as a foreign key.
http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_relationships.htm#relate_query_limits
Relationship queries are not the same as SQL joins. You must have a
relationship between objects to create a join in SOQL.
There are some workarounds though. Why exactly do you need the join?
Apex / SOQL - have a look at SOQL in apex - Getting unmatched results from two object types for example. Not the prettiest thing in the world but it works. If you want to try something really crazy - SOSL that would search your 2 objects at the same time?
Reports - you should have no problem grouping by text field - that means a joined report might give you results you're after. Since Winter'13 joined reports allow charts and exporting, that was quite a limiting factor...
Easy building of links between data - use external ids and upsert operation, especially if you plan to load data from outside SF easily. Check my answer for Can I insert deserialized JSON SObjects from another Salesforce org into my org?
Uniqueness constraints - you can still mark fields as required & unique.
Check against "dictionary" of allowed values - Validation rule with VLOOKUP might do what you're after.

Related

Single Datascript Query Challenge in LogSeq

I'm trying to learn Datascript in the context of LogSeq, and I've stumbled into something I'm not sure how to solve.
The Fundamental Problem
I'm trying to query for a subset of entities that have NOT been referenced by attributes on a different group of filtered entities.
The Background
General LogSeq schema: https://github.com/logseq/logseq/blob/master/deps/db/src/logseq/db/schema.cljs
LogSeq Documentation for Datascript: https://docs.logseq.com/#/page/advanced%20queries
I've got a set of entities, with :block/properties like so:
tags:: contact
list:: C
Other entities have :block/refs to these pages.
I'm trying to create a query that shows me the contacts in a given list (A|B|C) that have NO notes within the past two weeks.
In SQL this would be a straightforward left join, but I'm having trouble translating that to Datalog since the information is in two different entity groups (instead of attributes all on the same entity). I assume there's some sort of not-join to filter out the contacts that have recent refs, but since that data is in other entities, I'm not sure how to structure the query since my implicit joins knock out either one group or the other.
I should add, because this is in LogSeq, I can't do two separate queries and join them in code. It has to be in one go.
Thank you!

Implementing tag system with multiple tables

I am trying to implement tag system similar to one which StackOverflow has. Obviously I've read multiple articles including this answer.
However my scenario is little bit different
there will be limited amount of tags which can be only created by user with higher privilege (anybody can assign a tag there). This excludes option #1 (from SO question I linked above, each tag is inserted directly into the tables tags column and then it's queried with LIKE) I guess
there are also multiple tables in DB which can be tagged (currently five)
Especially second criteria makes it harder so these are my thoughts
I could follow option #3, have table tags and have M:N relationship with each table. However that would make searching harder (imagine that join if the table number grows) and also I need to tell which table (application module) matches the tag in a search result
I could use some kind of polymorphism but I am pretty new to this concept regarding to the databases so is this something which fits to this problem well?
I use newest version of PostgreSQL.
Since you are using PostgreSQL, you have the option of some field types which aren't available for other databases. Particularly, arrays and JSON fields. I did some performance comparisons of the various methods in a blog post. Arrays and JSONB were definitely better options than a tags table for any search which needed to combine multiple tags.
Given that, I would recommend creating a tags column for each table on which you want to have tags, either an array or a JSONB column, depending. If you need to search over multiple tables, I'd suggest a UNION query instead of having a single monolithic tags table which joins to everything.

Datomic table model

I have an application that requires a database containing a set of products where each product can have a set of tables. The end-user should be able to add new products and define new tables for a product. So each table has a set of columns that are specified by the user. The user can then fill the tables with rows of data. Each table belongs to exactly one product.
The end-user should also be able to view the tables as they were at a specific point in time (at a certain transaction).
How would I go about making a schema for this in Datomic so that querying it would be as efficient as possible?
I would go with 4 entity types: products, tables, columns, and rows.
The relationship between products and tables is best handled by a :table/product to-one ref attribute, but a :product/tables to-many component ref attribute could also work (the latter does not enforce the one-to-many relationship).
Likewise, I would use either a :column/table or :table/columns attribute. I would also have a :column/name string attribute and maybe a :column/type enumerated attribute.
The hardest part is to model rows.
One tempting solution is to just create an attribute per column - I actually think it's bad idea, Datomic attributes are not intended for such a dynamic use. In particular, schema attributes are stored in a cache on the Peer that's not meant to grow big. (I may be wrong about this, so it'd be nice if someone in the Datomic team could confirm.)
Instead, I would have a few dozens reusable :row/cell-0, :row/cell-1, :row/cell-2, etc. 'cell position' attributes, that are shared across all tables. Each actual column would be mapped to a at creation time by a to-one :column/position attribute.
If the rows can have several data types, it's a bit more difficult, you'd have to basically make an attribute for each (type,position) pair.
Then each row basically consist of a :row/table attribute and the above cell position attributes.
Here's a Datalog query that would let you read the whole table
[:find ?row ?column-name ?val :in $ ?table :where
[?column :column/table ?table]
[?row :row/table ?table]
[?row ?pos ?val]
[?column :column/position ?pos]
[?column :column/name ?column-name]]
Note that all of the above is only useful if you want to query the table with Datalog directly against your Datomic db. But it can be also completely fine to serialize your tables and store them as blobs - especially if they're small; later, you pull out the blob, deserialize it, then you can query with Datalog too. And if tables are to coarse for this use, maybe you can do it with rows.

Database Design for Asset Management

I'm developing an Asset management application.
Looking through the excel tracker that was being used previously, I was able to identify some attributes that were common to all categories of assets (basically non-technical attributes such as Purchase Order No. , Warranty Info etc.) for which I think I will make a separate table.
But when storing technical-attributes, there are many categories of assets for which I need only one or two additional attributes to be stored.
Should a make a single table for all these attributes and store NULLs wherever applicable or should I make a separate table each category containing just the asset ID and the addition columns? Which approach is better/more pragmatic?
Is cluttering the database with too many tables ok? I have around 10 such categories.
There are 3 known approaches to this:
Single table
In this model, you have a single table with all known columns, and allow them to be null for types that don't have that attribute. This gives you a simple database, and fairly simple SQL, but doesn't allow support for common features that relational databases give you, like insisting on non-null columns for a data type, or creating unique indices where that makes sense.
It also tends to lead to messy SQL, with developers forgetting over time what columns mean, so you could get a column being used for multiple purposes.
It does make it easy to join to other tables - so if you have an asset and a purchase related to that asset, the "purchase" table joins to the "asset" table on "assetID".
Table per subtype
In this case, you build a table for each subtype, and enforce the data characteristics of that subtype with not null, unique etc.
This creates a clearer separation of subtypes, and is less likely to degrade into big ball of mud, but makes joins very hard - to join from "purchase" to "asset", you have to know which table holds that particular asset.
Common table for common fields, table per subtype
In this model, you have a single table for the fields that are common between subtypes - you say you've identified this already - and have further tables for each subtype to store the unique attributes.
This solves the joining problem between "asset" and "purchase", keeps the data pretty self-describing.
It does mean client logic needs to implement the "join asset_master to asset_subtype" issue.
I prefer option 3 - it's the best trade-off between maintainability and managability.
Databases should be able to handle lots of columns and lots of tables, so both approaches should work from that perspective.
If you don't have any additional requirements, I'd use the single table approach. It is the easiest, and the only thing you are loosing is the ability to put not null constraints on the fields that exist only form some categories

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Resources