What is the simplest way to quickly create edges in ArangoDB programmatically?
I would like to create relationships between documents based on a common attribute. I'd like to be able to select an attribute, and for every document in collection A, create an edge to every document in collection B that has the same value in an equivalent attribute.
For example, if I've imported email messages into a collection and people into another collection, I would like to generate edges between the emails and collections. An email's schema might look like this:
{
"_key":
"subject":
"body":
"from":
"to":
}
And a person's schema might look like this:
{
"_key":
"name":
"email":
}
Let's say that the values in the from and to fields in the email messages correspond to email addresses that we may find in the people collection.
I'd like to be able to take the collections, attributes, and edge parameters as input, then, for every document in the people collection, create an edge to every document in the email collection that has the same email address in the from attribute as the current document's email attribute.
So far, I think that Foxx may be the best tool for this, but I am a bit overwhelmed by the documentation.
Eventually, I'd like to create a full CRUD based on shared attributes between documents defining edges, including an "upsert" equivalent- updating an edge if it already exists and creating it if it doesn't.
I know that doing this with individual API calls with the standard HTTP API would be far too slow, since I would need to query Arango for every document in a collection and return very large numbers of results.
Is there already a Foxx service that does this? If not, where should I start to create one?
A single AQL query should suffice:
FOR p IN people
FOR e IN emails
FILTER p.email == e.from
INSERT {_from: p._id, _to: e._id} INTO sent
The email addresses in the vertex collection people are matched with the from email addresses of the emails vertex collection. For every match, a new edge is inserted into an edge collection sent, linking people and email records.
If both vertex collections contain a small number of documents, it is okay to execute this query without indexes (e.g. 1,000 persons and 3,000 emails took about 2 seconds in my test). For larger datasets, make sure to create a hash index in people on the email attribute, and in emails a hash index on from. It reduced the execution time to about 30ms in my test.
Related
I am working on a Contract / Invoicing project, I made the contract table and I need to add the contracts in PDF or Word form.
I wonder what is the best way to integrate attachments in sharepoint, do I have to use a library or is it better to work with an attachment column in the contract list
Several benefits to each.
Use a library when:
The focus is the document. The metadata/columns is secondary.
Each document is unique, not part of a set.
The content of the document needs to be indexed for search, and uniquely found. (The link in the search results points to the document.)
Use a list with attachments when:
The focus is the metadata/columns and there may be zero or many related documents.
You need a list item even if there is no document, or no document yet.
It's OK when you search for a keyword found in one of the documents and you get the list item returned, not the document. (You would then need to open each attachment to find the search for term.) I.e. the link in the search results points to the list item, not the document.)
Use a Document Set when:
You need a focus on a set of documents, kind of like a folder with metadata that contains zero to many files.
You need to automatically create a collection of documents each time you create a new Document Set. (New hire form, parking permit, photo id appointment, training schedule, etc.)
You like having a "home page" / information panel for each set of documents.
I have two potential collections 1) Accounts and 2) Jobs. The Accounts collection stores "account" documents, which contain info about a user account. Fields include username, email, password, description, etc.
The Jobs collection contains "Job" documents, which contain information about a job, with fields such as: title, description, date created, relevant skills, etc.
Essentially, users can create jobs, and thus these two collections are tied together, since I would like to display the jobs on the user Dashboard UI. I have thought of 3 scenarios to achieve this:
Each Job document has a UserID field, which acts as a pointer to the corresponding account that created the job. When the dashboard loads, a query is done in the jobs collection for all UserID fields matching the accountID.
Each Account document has a field JobIDs, which is an array of "pointers" that references each of the JobIDs that were created by an account. When the dashboard loads, it loops through this array, performing a query based on the JobID to find the corresponding jobs in the Jobs collection.
On each Account document, there is simply a jobs field, which is an array of objects, and each object represents a job.
Which is the best implementation and why? I realize for choice 3, there seems to be the benefit of not having to do any queries, but with the downside of potentially huge documents. With the "pointer" implementation, would it be slow to search an entire collection?
I want to create a multi-select Contact Lookup.
What i want :
When user clicks on a lookup then he should be able to select multiple contacts from that.
What i have done:
I have created an object and a field inside that object using both
"Lookup" and
"MasterDetail Relationship" and
"Junction Object"
When i try to use this Field for any input text/Field then it always provides an option to select only one value from lookup but i want to have an option to select multiple.
Even in the Junction object i have created 2 master-detail relationships still lookup allows only one value to be selected.Moreover it makes the field mandatory which i don't want.
Links that i followed:
http://success.salesforce.com/questionDetail?qId=a1X30000000Hl5dEAC
https://ap1.salesforce.com/help/doc/user_ed.jsp?loc=help§ion=help&hash=topic-title&target=relationships_manytomany.htm
Can anybody suggest me how to do this.
Its same as we use Email CC/BCC under Send Email option for any Lead.
Even you use a junction object a lookup is just that, it references (looks up to) one other record: when you create a record on the junction object you still have to set each lookup individually and you're still creating only one record.
Master Detail relationships are essentially lookups on steroids, one object becomes the child of the other and will be deleted if the parent object is deleted, they're not going to provide an interface to lookup to many records at once.
If you're not a developer then your best bet is to either just create on junction object record at a time, or look into using dataloader. You could prepare your data in Excel or similar and then upload all the records into Salesforce in one go.
If you are a developer, or have developers at your disposal, then what we've done in the past is create a Visualforce page to do the job. So if, for example, you wanted to link a bunch of contacts up to an Account, we'd have a single account lookup field on the page, then some search fields relating to fields on the contact. Using a SOQL query you can then find all contacts matching the search parameters and display them in a list, where you may want to provide checkboxes to allow the user to select the contacts they want. Then it's just a case of looping through the selected contacts, setting their Account field to be the chosen account.
There are areas in Salesforce (such as the send Email functionality you mentioned) where it's clear to see that bespoke work has been done to fulfil a specific task — another instance of what you want is in the area where you can manage campaign members. This is the model I've copied in the past when implementing a Visualforce page as described.
Good luck!
For adding multiple junction objects at one time, the only solution we have found is a custom Visualforce page, as described by LaceySnr.
For a slightly different problem, where we need to assign many of object B to object A, We have trained our users to do this with a view on object B. We are assigning Billing Accounts (B) to Payment Offices (A). The view on Billing Account has check boxes on the left side. The user checks the Billing Accounts to be assigned, then double-clicks on the Payment Office field on any of the checked rows. A pop-up asks if you want to update only the single row or all checked rows. By selecting 'all checked rows', the update is done to all of them.
The view is created by the user, who enters the selection criteria (name, address, state, etc.). All user-created views are visible only to them.
I am working on a system, which will run on GAE, which will have several related entities and I am not sure of the best way to store the data. This post is a request for advice from others who may have similar experience....
The system will have users, with profile data and an image. Those users will be able to create "events" and add journal entries to it. For the purpose of the system, the "events" will likely have 1 or 2 journal entries in them, and anything over 10 would likely never happen. Other users will be able to add comments to users' entries as well, where popular ones may have hundreds or even thousands of comments. When a random visitor uses the system, they should be able to see the latest events (latest, being defined by those with latest journal entries in them), search by tag, and a very perform basic text search. Then upon selecting an event to view, it should be displayed with all journal entries, and all user comments, with user images alongside comments. A user should also have a kind of self-admin page, to view/modify/delete their events and to view/modify/delete comments they have made on other events. So, doing all this on a normal RDBMS would just queries with some big joins across several tables. On GAE it would obviously need to work differently. Here are my initial thoughts on the design of the entities:
Event entity - id, name, timstamp, list
property of tags, view count,
creator's username, creator's profile
image id, number of journal entries
it contains, number of total comments
it contains, timestamp of last update to contained journal entries, list property of index words for search (built/updated from text from contained journal entries)
JournalEntry entity - timestamp,
journal text, name of event,
creator's username, creator's profile
image id, list property of comments
(containing commenter username and
image id)
User entity - username, password hash, email, list property of subscribed events, timestamp of create date, image id, number of comments posted, number of events created, number of journal entries created, timestamp of last journal activity
UserComment entity - username, id of event commented on, title of event commented on
TagData entity - tag name, count of events with tag on them
So, I'd like to hear what people here think about the design and what changes should be made to help it scale well. Thanks!
Rather than store Event.id as a property, use the id automatically embedded in each entity's key, or set unique key names on entities as you create them.
You have lots of options for modeling the relationship between Event and JournalEntry: you could use a ReferenceProperty, you could parent JournalEntries to Events and retrieve them with ancestor queries, or you could store a list of JournalEntry key ids or names on Event and retrieve them in batch with a key query. Try some things out with realistically-distributed dummy data, and use appstats to see what works best.
UserComment references an Event, while JournalEntry references a list of UserComments, which is a little confusing. Is there a relationship between UserComment and JournalEntry? or just between UserComment and Event?
Persisting so many counts is expensive. When I post a comment, you're going to write a new UserComment entity and also update my User entity and a JournalEntry entity and an Event entity. The number of UserComments you expect per Event makes it unwise to include everything in the same entity group, which means you can't do these writes transactionally, so you'll do them serially, and the entities might be stored across different network nodes, making the whole operation slow; and you'll also be open to consistency problems. Can you do without some of these counts and consider storing others in memcache?
When you fetch an Event from the datastore, you don't actually care about its list of search index words, and retrieving and deserializing them from protocol buffers has a cost. You can get around this by splitting each Event's search index words into a separate child EventIndex entity. Then you can query EventIndex on your search term, fetch just the EventIndex keys for EventIndexes that match your search, derive the corresponding Events' keys with key.parent(), and fetch the Events by key, never paying for the retrieval or deserialization of your search index word lists. Brett Slatkin explains this strategy here at 14:35.
Updating Event.viewCount will fail if you have a lot of views for any Event in rapid succession, so you should try out counter sharding.
Good luck, and tell us what you learn by trying stuff out.
I have 10-12 items which i need to maintain a blocklist for on my system. Which design is better? These are sample columns, much more items to block.
table 1
b_id
b_email
b_name
b_username
b_pagename
b_word
b_IP
comments
table 2
b_id
b_type
text
comments
Basically in table 1, each blocked item is a value in 1 column only, rest are all NULL.
In table 2, each blocked item resides in the only column so there are no NULLs
There are other designs possible too like separate tbl for each item but then there will be lots of tbs just to hold blocklists.
EDIT: The use of this data is to block users from performing certain activities. Each blocked item is used in differnt places. Example:
block_IP = list of IP addresses that the website will block based on detected user's IP
block_name = list of restricted first/last names users cannot use to signup with
block_email = list of restricted emails users cannot use to signup with
block_username = list of restricted usernames users cannot use to get a profile name
block_pagenames = list of restricted page names users cannot create
block_word = abusive words which users cannot use within content of comments, blogs, etc.
and the list goes on...
So basically these are all like individual lookup items. In an ideal world we would have separate tables for each item. But I dont like to idea of having 20-30 tables just to hold blocked items values. Should be an easier way to manage all this. Only issue is some items like block_Word can grow to millions of rows as there are a lot of words that can be blocked in many languages.
Check out the Entity-Attribute-Value approach or use a schemaless NoSQL datastore.
http://en.wikipedia.org/wiki/Entity-attribute-value_model
If you're processing the 'blocking' in the middle tier, you can just dump the lists as serialized objects (e.g. JSON) into the table.
I assume you're trying to do something like access control lists, which depending on your plaform you might be able to find a plugin for.