I'm new to OrientDB. I would like to use the functionalities of OrientDB, however, my mind is a little bit confused with the usage of #rid function.
I have read the web documentation of the application and group discussions, and I am trying to figure out the concepts.
I apologise for any cross-posting about the issue. I couldn’t understand the usage of (#rid). In an online application, how do we understand and use the #rid value of a vertex or node?
Do we have to use all of them? How can we traverse without using #rid? How can we be sure about the 11:4 value in a framework that produces such kind of dynamic query?
select from 11:4 where any() traverse(0,10) (address.city = 'Rome')
This is the Record ID (rid for short) of a OrientDB database. It uniquely identifies a record within the database. Some databases use a globally unique identifier (MongoDB) or a primary key (RDBMS). These are all similar concepts.
Because OrientDB is "several databases in one", record translates to
Document, if using the document database (API)
Vertex or Edge, if using the Graph database (API)
Object, if using the Object Database (API)
So in your example 11:4 means the fifth record (first record is 0) in cluster 11 (first cluster is also 0). This is a (almost) direct pointer to the physical record in the database. That becomes the starting point of your traversal. Key to understand is that you have very fast access to your data without using an index lookup.
Related
Recently a customer asked how to create a Node with an Edge recursively pointing back to the same Node. The use case was around the concept of a Product "recommending" another Product. Here's a conceptual diagram.
In SQL Server's SQL Graph, any table can be attributed with one of the DDL extensions AS NODE or AS EDGE. When an Edge is created it is not directed or constrained, but with the new CONNECTION keyword, Edges can be constrained to-and-from only specified Nodes. Let's start with the Products table.
CREATE TABLE Products
(
Id INT PRIMARY KEY
, Name VARCHAR(50) NOT NULL
) AS NODE;
This creates an empty table ready to be filled from your RDBMS database for querying with graph queries. You might, for example, want to ask "Does the recommendation chain of THIS product recursively EVER recommend THAT product?" That's a difficult question to ask with a standard TSQL query in any database. It's relatively simple in a graph database.
Aside: This is where the idea of SQL Graph is interesting. If you have a single question well suited for a graph database, why migrate your data to a dedicated graph database and lose out on the capabilities of SQL Server around performance, scalability, high availability, interoperability, reporting, and support? SQL Graph lets you build a little graph right on top of an existing RDBMS structure without any of those potential compromises.
Here's the magic.
CREATE TABLE Recommends
(
CONSTRAINT EC_RECOMMENDS
CONNECTION (Product TO Product)
ON DELETE CASCADE
) AS EDGE
This DDL statement lets you create the Edge you want for recommendation and adds a CONNECTION constraint to ensure the Edge can only be from a Product to a Product and no other Node can participate. Note: you could add ANOTHER constraint if you wanted to reuse this Edge with other Nodes.
Now you can answer that question "Does the recommendation chain of THIS product recursively EVER recommend THAT product?" with query something like this:
SELECT
STRING_AGG(Product.name, '->') WITHIN GROUP (GRAPH PATH) AS RecommendationPath
FROM
Product AS Product,
Recommends FOR PATH AS recommends,
Product FOR PATH AS Recommendation
WHERE
MATCH(SHORTEST_PATH(Product(-(recommends)->Recommendation)+))
AND Product.Id = 123
AND Recommendation.Id = 234
There are several graph-specific functions built-in to TSQL today. For those missing you have a few interesting options: 1) write your own in TSQL. I have done this on several projects and find it uncommonly straight forward, depending on the algorithm, or 2) consider filtering a subset of the data suited for the algorithm you need and use SQL Server's ML Services capability to expose that data to whatever library your data scientists enjoy most. Having said that, the need to do #2, exporting your data, will be limited to SQL Managed Instance (in Azure) and is super-duper uncommon.
How can I add a new customer segment using only the database? I know how to create customer segments in CMC, but I'm looking to automate the process of adding, say, hundreds of user segments by writing a script to do it for me. However, I can't find any information on how to create a new customer segment using only DB2 database queries.
Is there a way to create a new customer segment using nothing but DB2 database queries?
I would not recommend that you use SQL directly to create customer segments, as this makes generation of primary keys and updates of the KEYS table your responsibility. And once you take that on, Murphy's Law states that you'll get something wrong.
Your question asks how to create "hundreds of user segments". However, I'm not sure if that's what you meant, or if you meant that you had hundreds of users to add to existing segments.
If you're talking about loading hundreds of users, then I'd refer you to
this article in the Knowledge Center that explains how you can use the MemberGroupMemberMediator to load segments from e-mail addresses.
If you truly mean to create segments by data load, I'd refer you to this Knowledge Center article that shows how to create member groups. A customer segment is a member group with a particular usage type.
For reference, these are the tables involved:
MBRGRP: The base group (segment) definintion
MBRGRPCOND: This is used to define the condition if it is a rule-based segment (e.g. "all shoppers over the age of 25")
MBRGRPDESC: The NLS description (name, etc) of the segment
MBRGRPMBR: For manually defined segments, this defines the members (relationship to MEMBER table)
MBRGRPTYPE: The type of the member group (e.g. "CustomerGroup")
MBRGRPUSG: The usage code for the member group (e.g. "GeneralPurpose")
What version fixpack / fep are you working with? Have you read http://www-01.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.management-center.doc/tasks/tsbctsegbatch.htm?lang=en
Technically, changing the DB2 database directly is not officially supported. There are things like stagingprop that depend on certain actions happening in certain ways. For example, primary keys of any rows in any table that is a part of stagingprop cannot be updated. CMC will instead do this for you as a delete and an insert when you make the change through CMC.
That said, I've seen unsupported methods like this used to update/change/create data in WebSphere Commerce databases. I don't have the information specific to how to do this for customer segments. I just caution you that it is dangerous when changing the DB2 database directly, so be sure you have backups and evaluate the impact on other processes like stagingprop or dbclean very carefully.
I've been trying to learn programming for a while. I've studied Java and Python, and I'm comfortable with their syntax. Recently, I wanted to use what I've learnt with coding a tangible software from ground up.
I want to implement a database engine, sort of a NoSQL database. I've put together a small document, sort of a specification to follow throughout my adventure of coding it. But all I know is a bunch of keywords. I don't know where to start.
Can someone help me find out how to gather the knowledge I need for this kind of work and in what order to learn things? I have searched for documents, but I feel like I'll end up finding unrelated/erroneous content or start from a wrong point, because implementing a complete database engine is (seeming to be) a truly complicated task.
I wan't to express that I'd prefer theses and whitepapers and (e)books to codes of other projects, because I've asked a question of kind in which people usually get answered in the form of "read project - x' source code". I'm not at the level of comfortably reading and understanding source code.
First, you may have a look that the answers for How to write a simple database engine. While it focus on a SQL engine, there is still a lot of good material in the answers.
Otherwise, a good project tutorial is Implementation of a B-Tree Database Class. The example code is in C++, but the description of what is done and why is probably what you'll want to look at anyway.
Also, there is Designing and Implementing Structured Storage (Database Engine) over at MSDN. Plenty of information there to help you in your learning project.
Because the accepted answer only offers (good) links to other resources, I'd thought I share my experience writing webdb, a small experimental database for browsers. I also invite you to read the source code. It's pretty small. You should be able to read through it and get a basic understanding of what it's doing in a couple of hours. Warning: I am a n00b at this and since writing it I learned a lot more about it and see I have been doing some things wrong. It can help you get started though.
The basics: BTree
I started out with adapting an AVL tree to suit my needs. An AVL tree is a kind of self-balancing binary search tree. You store the key K and related data (if any) in a node, then all items with key < K in a node in the left subtree and all items with key > K in a right subtree. You can use an array to store the data items if you want to support non unique keys.
This tree will give you the basics: Create, Update, Delete and a way to quickly get an item by key, or all items with key < x, or with key between x and y etc. It can serve as the index for our table.
A schema
As a next step I wrote code that lets the client code define a schema. Methods like createTable() etc. Schemas are typically associated with SQL, but even no-SQL sort-of has a schema; they usually require you to mark the ID field and any other fields you want to search on. You can make your schema as fancy as you want, but you typically want to model at least which column(s) serve as primary key and which fields will be searched on frequently and need an index.
Creating a data structure to store a table
I decided to use the tree I created in the first step to store my items. These were simple JS objects. Having defined which field contains the PK, I could simply insert the item into the tree using that field's value as the key. This gives me quick lookup by ID (range).
Next I added another tree for every column that needs an index. In these trees I did not store the full record, but only the key. So to fetch a customer by last name, I would first use the index on last name to get the ID, then the primary key index to get the actual record. The reason I did not just store the (reference to the) actual object is because it makes set operations a little bit simpler (see next step)
Querying
Now that we have a table with indexes for PK and search fields, we can implement querying. I did not take this very far as it becomes complicated quickly, but you can get some nice functionality with just some basics. WebDB does not implement joins; all queries operate only on a single table. But once you understand this you see a pretty clear (though long and winding) path to doing joins and other complicated stuff as well.
In WebDB, to get all customers with firstName = 'John' and city = 'New York' (assuming those are two search fields), you would write something like:
var webDb = ...
var johnsFromNY = webDb.customers.get({
firstName: 'John',
city: 'New York'
})
To solve it, we first do two lookups: we get the set X of all IDs of customers named 'John' and we get the set Y of all IDs of customers from New York. We then perform an intersection on these two sets to get all IDs of customers that are both named 'John' AND from New York. We then run through our set of resulting IDs, getting the actual record for each one and adding it to the result array.
Using the set operators like union and intersection we can perform AND and OR searches. I only implemented AND.
Doing joins would (I think) involve creating temporary tables in memory, then populating them as the query runs with the joined results, then applying the query criteria to the temp table. I never got there. I attempted some syncing logic next but that was too ambitious and it went downhill from there :)
I'm currently specing out a project that stored threaded comment trees.
For those of you unfamiliar with what I'm talking about I'll explain, basically every comment has a parent comment, rather than just belonging to a thread. Currently, I'm working on a relational SQL Server model of storing this data, simply because it's what I'm used to. It looks like so:
Id int --PK
ThreadId int --FK
UserId int --FK
ParentCommentId int --FK (relates back to Id)
Comment nvarchar(max)
Time datetime
What I do is select all of the comments by ThreadId, then in code, recursively build out my object tree. I'm also doing a join to get things like the User's name.
It just seems to me that maybe a document storage like MongoDB which is NoSql would be a better choice for this sort of model. But I don't know anything about it.
What would be the pitfalls if I do choose MongoDB?
If I'm storing it as a Document in MongoDB, would I have to include the User's name on each comment to prevent myself from having to pull up each user record by key, since it's not "relational"?
Do you have to aggressively cache "related" data on the objects you need them on when you're using MongoDB?
EDIT: I did find this arcticle about storing trees of information in MongoDB. Given that one of my requirements is the ability to list to a logged in user a list of his recent comments, I'm now strongly leaning towards just using SQL Server, because I don't think I'll be able to do anything clever with MongoDB that will result in real performance benefits. But I could be wrong. I'm really hoping an expert (or two) on the matter will chime in with more information.
The main advantage of storing hierarchical data in Mongo (and other document databases) is the ability to store multiple copies of the data in ways that make queries more efficient for different use cases. In your case, it would be extremely fast to retrieve the whole thread if it were stored as a hierarchical nested document, but you'd probably also want to store each comment un-nested or possibly in an array under the user's record to satisfy your 2nd requirement. Because of the arbitrary nesting, I don't think that Mongo would be able to effectively index your hierarchy by user ID.
As with all NoSQL stores, you get more benefit by being able to scale out to lots of data nodes, allowing for many simultaneous readers and writers.
Hope that helps
I have an MS Access database with plenty of data. It's used by an application me and my team are developing. However, we've never added any foreign keys to this database because we could control relations from the code itself. Never had any problems with this, probably never will either.
However, as development has developed further, I fear there's a risk of losing sight over all the relationships between the 30+ tables, even though we use well-normalized data. So it would be a good idea go get at least the relations between the tables documented.
Altova has created DatabaseSpy which can show the structure of a database but without the relations, there isn't much to display. I could still use to add relations to it all but I don't want to modify the database itself.
Is there any software that can analyse a database by it's structures and data and then do a best-guess about its relations? (Just as documentation, not to modify the database.)
This application was created more than 10 years ago and has over 3000 paying customers who all use it. It's actually document-based, using an XML document for it's internal storage. The database is just used as storage and a single import/export routine converts it back and to XML. Unfortunately, the XML structure isn't very practical to use for documentation and there's a second layer around this XML document to expose it as an object model. This object model is far from perfect too, but that's what 10 years of development can do to an application. We do want to improve it but this takes time and we can't disappoint the current users by delaying new updates.Basically, we're stuck with its current design and to improve it, we need to make sure things are well-documented. That's what I'm working on now.
Only 30+ tables? Shouldn't take but a half hour or an hour to create all the relationships required. Which I'd urge you to do. Yes, I know that you state your code checks for those. But what if you've missed some? What if there are indeed orphaned records? How are you going to know? Or do you have bullet proof routines which go through all your tables looking for all these problems?
Use a largish 23" LCD monitor and have at it.
If your database does not have relationships defined somewhere other than code, there is no real way to guess how tables relate to each other.
Worse, you can't know the type of relationship and whether cascading of update and deletion should occur or not.
Having said that, if you followed some strict rules for naming your foreign key fields, then it could be possible to reconstruct the structure of the relationships.
For instance, I use a scheme like this one:
Table Product
- Field ID /* The Unique ID for a Product */
- Field Designation
- Field Cost
Table Order
- Field ID /* the unique ID for an Order */
- Field ProductID
- Field Quantity
The relationship is easy to detect when looking at the Order: Order.ProductID is related to Product.ID and this can easily be ascertain from code, going through each field.
If you have a similar scheme, then how much you can get out of it depends on how well you follow your own convention, but it could go to 100% accuracy although you're probably have some exceptions (that you can build-in your code or, better, look-up somewhere).
The other solution is if each of your table's unique ID is following a different numbering scheme.
Say your Order.ID is in fact following a scheme like OR001, OR002, etc and Product.ID follows PD001, PD002, etc.
In that case, going through all fields in all tables, you can search for FK records that match each PK.
If you're following a sane convention for naming your fields and tables, then you can probably automate the discovery of the relations between them, store that in a table and manually go through to make corrections.
Once you're done, use that result table to actually build the relationships from code using the Database.CreateRelation() method (look up the Access documentation, there is sample code for it).
You can build a small piece of VBA code, divided in 2 parts:
Step 1 implements the database relations with the database.createrelation method
Step 2 deleted all created relations with the database.delete command
As Tony said, 30 tables are not that much, and the script should be easy to set. Once this set, stop the process after step 1, run the access documenter (tools\analyse\documenter) to get your documentation ready, launch step 2. Your database will then be unchanged and your documentation ready.
I advise you to keep this code and run it regularly against your database to check that your relational model sticks to the data.
There might be a tool out there that might be able to "guess" the relations but I doubt it. Frankly I am scared of databases without proper foreign keys in particular and multi user apps that uses Access as a DBMS as well.
I guess that the app must be some sort of internal tool, otherwise I would suggest that you move to a proper DBMS ( SQL Express is for free) and adds the foreign keys.