Non-standard joins in QlikView? - database

Lately we have been testing QlikView in the office. The first impression is good: it has an attractive interface and performs very fast. We want to use it as a database frontend for our customers. We are also trying to determine whether it can take over parts of our relational database structure. However, we are in doubt whether its database functions are advanced enough to be more than an attractive frontend.
Specifically, we run into the following problem. The equivalent of normal JOIN (equijoin) operations can be done in QlikView simply by setting equal field names across tables - those fields will then be linked. However, one of our traditional SQL JOIN operations uses a "BETWEEN" query to find out whether a date is in a certain range and join the data on that.
Is it possible to specify such a "non-equijoin" relationship between tables in QlikView? Or is this an inherent limitation to the so-called "associative database" structure?

Marcus' answer is correct. The way to do this is to use IntervalMatch. You can have the two tables as they are and add a "between" relationship between, using IntervalMatch. You can't add relationships after the load script has run.
First you'll have to load the table that has the date range (sql queries omitted). Let's say:
Ranges:
LOAD
rangeID,
validfrom, // date
validto, // date
commonkey, // common key for the two tables
price; // the data that's needed as a result of the linking
Second, you load another table with the date
Data:
LOAD
column1,
column2,
date,
commonkey;
Next you will have to use the IntervalMatch. This is one way to do it:
Left Join (Data)
IntervalMatch(date, commonkey)
LOAD
validfrom,
validto,
commonkey
Resident Ranges;
Now you have the link between the two tables. You can delete the resulting synthetic key by adding this:
Left Join (Data)
LOAD
validfrom,
validto,
commonkey,
rangeID
Resident Ranges;
DROP Fields validfrom, validto FROM Data;
Now the tables are linked by using the rangeID key. If the tables don't have some key in common, like a category id or something, (ie. just the dates need to be matched), you can just ignore the commonkey in the example above. I just wanted to include it in the example because I needed it in my own case and hopefully it will help someone with a similar issue.
You can find this in the Qlikview help labeled "IntervalMatch (extended)". The Qlikview cookbook (fillrowsintervalmatch.qvw) also helped me with this issue.

Sure can - I think what you want it the IntervalMatch function.

Related

Query two custom objects joining on the Name field

I want to create a join on two custom objects joining on the Name field. Normally joins require a lookup or master-detail relationship between the two objects, but I just want to do a text match.
I think this is a Salesforce limitation but I couldn't find any docs on whether this was so. Can anyone confirm this?
Yes, you can make a join (with dot notation or as subquery) only if there's a relationship present. And relationships (lookup or master-detail) can be made only by Id. There are several "mutant fields" (like Task.WhoId) but generally speaking you can't write a JOIN in SOQL and certainly can't use a text column as a foreign key.
http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_relationships.htm#relate_query_limits
Relationship queries are not the same as SQL joins. You must have a
relationship between objects to create a join in SOQL.
There are some workarounds though. Why exactly do you need the join?
Apex / SOQL - have a look at SOQL in apex - Getting unmatched results from two object types for example. Not the prettiest thing in the world but it works. If you want to try something really crazy - SOSL that would search your 2 objects at the same time?
Reports - you should have no problem grouping by text field - that means a joined report might give you results you're after. Since Winter'13 joined reports allow charts and exporting, that was quite a limiting factor...
Easy building of links between data - use external ids and upsert operation, especially if you plan to load data from outside SF easily. Check my answer for Can I insert deserialized JSON SObjects from another Salesforce org into my org?
Uniqueness constraints - you can still mark fields as required & unique.
Check against "dictionary" of allowed values - Validation rule with VLOOKUP might do what you're after.

should i consolidate these database tables .

i have an event calendar application with a sql database behind it and right now i have 3 tables to represent the events:
Table 1: Holiday
Columns: ID, Date, Name, Location, CalendarID
Table 2: Vacation
Columns: Id, Date, Name, PersonId, WorkflowStatus
Table 3: Event
Columns: Id, Date, Name, CalendarID
So i have "generic events" which go into the event tableand special events like holidays and vacation that go into these separate tables. I am debating consolidating these into a single table and just having columns like location and personid blank for the generic events.
Table 1: Event:
Columns : Id, Date, Name, Location, PersonId, WorkflowStatus
does anyone see any strong positives or negative to each option. Obviously there will be records that have columns that dont necessarily apply but it there is overlap with these three tables.
Either way you construct it, the application will have to cope with variant types. In such a situation I recommend that you use a single representation in the DBM because the alternative is to require a multiplicity of queries.
So it becomes a question of where you stick the complexity and even in a huge organization, it's really hard to generate enough events to worry about DBMS optimization. Application code is more flexible than hardwired schemata. This is a matter of preference.
If it were my decision, i'd condense them into one table. I'd add a column called "EventType" and update that as you import the data into the new table to specify the type of event.
That way, you only need to index one table instead of three (if you feel indexes are required), the data is all in one table, and the queries to get the data out would be a little more concise because you wouldn't need to union all three tables together to see what one person has done. I don't see any downside to having it all in one table (although there will probably be one that someone will bring up that i haven't thought of).
How about sub-typing special events to an Event supertype? This way it is easy to later add any new special events.
Data integrity is the biggest downside of putting them in one table. Since these all appear to be fields that would be required, you lose the ability to require them all by default and would have to write a trigger to make sure that data integrity was maintained properly (Yes, this must be maintained in the database and not, as some people believe, by the application. Unless of course you want to have data integrity problems.)
Another issue is that these are the events you need now and there may be more and more specialized events in the future and possibly breaking code for one type of event because you added another specialized field that only applies to something else is a big risk. When you make a change to add some required vacation information, will you be sure to check that it doesn't break the application concerning holidays? Or worse not error out but show information you didn't want? Are you going to look at the actual screen everytime? Unit testing just of code may not pick up this type of thing especially if someone was foolish enough to use select * or fail to specify columns in an insert. And frankly not every organization actually has a really thorough automated test process in place (it could be less risk if you do).
I personally would tend to go with Damir Sudarevic's solution. An event table for all the common fields (making it easy to at least get a list of all events) and specialized tables for the fields not held in common, making is simpler to write code that affects only one event and allowing the database to maintain its integrity.
Keep them in 3 separate tables and do a UNION ALL in a view if you need to merge the data into one resultset for consumption. How you store the data on disk need not be identical to how you need to consume the data so long as the performance is adequate.
As you have it now there are no columns that do not apply for any of the presented entities. If you were to merge the 3 tables into one you'd have to add a field at the very least to know which columns to expect to be populated and reduce your performance. Now when you query for a holiday alone you go to a subset of the data that you would have to sift through / index to get at the same data in a merged storage table.
If you did not already have these tables defined you could consider creating one table with the following signature...
create table EventBase (
Id int PRIMARY KEY,
Date date,
Name varchar(50)
)
...and, say, the holiday table with the following signature.
create table holiday (
Id int PRIMARY KEY,
EventId int,
Location varchar(50),
CalendarId int
)
...and join the two when you needed to do so. Choosing between this and the 3 separate tables you already have depends on how you plan on using the tables and volume but I would definitely not throw all into a single table as is and make things less clear to someone looking at the table definition with no other initiation.
Or combine the common fields and separate out the unique ones:
Table 1: EventCommon
Columns: EventCommonID, Date, Name
Table 2: EventOrHoliday
Columns: EventCommonID, CalendarID, isHoliday
Table3: Vacation
Columns: EventCommonID, PersonId, WorkflowStatus
with 1->many relationships between EventCommon and the other 2.

Storing Preferences/One-to-One Relationships in Database

What is the best way to store settings for certain objects in my database?
Method one: Using a single table
Table: Company {CompanyID, CompanyName, AutoEmail, AutoEmailAddress, AutoPrint, AutoPrintPrinter}
Method two: Using two tables
Table Company {CompanyID, COmpanyName}
Table2 CompanySettings{CompanyID, utoEmail, AutoEmailAddress, AutoPrint, AutoPrintPrinter}
I would take things a step further...
Table 1 - Company
CompanyID (int)
CompanyName (string)
Example
CompanyID 1
CompanyName "Swift Point"
Table 2 - Contact Types
ContactTypeID (int)
ContactType (string)
Example
ContactTypeID 1
ContactType "AutoEmail"
Table 3 Company Contact
CompanyID (int)
ContactTypeID (int)
Addressing (string)
Example
CompanyID 1
ContactTypeID 1
Addressing "name#address.blah"
This solution gives you extensibility as you won't need to add columns to cope with new contact types in the future.
SELECT
[company].CompanyID,
[company].CompanyName,
[contacttype].ContactTypeID,
[contacttype].ContactType,
[companycontact].Addressing
FROM
[company]
INNER JOIN
[companycontact] ON [companycontact].CompanyID = [company].CompanyID
INNER JOIN
[contacttype] ON [contacttype].ContactTypeID = [companycontact].ContactTypeID
This would give you multiple rows for each company. A row for "AutoEmail" a row for "AutoPrint" and maybe in the future a row for "ManualEmail", "AutoFax" or even "AutoTeleport".
Response to HLEM.
Yes, this is indeed the EAV model. It is useful where you want to have an extensible list of attributes with similar data. In this case, varying methods of contact with a string that represents the "address" of the contact.
If you didn't want to use the EAV model, you should next consider relational tables, rather than storing the data in flat tables. This is because this data will almost certainly extend.
Neither EAV model nor the relational model significantly slow queries. Joins are actually very fast, compared with (for example) a sort. Returning a record for a company with all of its associated contact types, or indeed a specific contact type would be very fast. I am working on a financial MS SQL database with millions of rows and similar data models and have no problem returning significant amounts of data in sub-second timings.
In terms of complexity, this isn't the most technical design in terms of database modelling and the concept of joining tables is most definitely below what I would consider to be "intermediate" level database development.
I would consider if you need one or two tables based onthe following criteria:
First are you close the the record storage limit, then two tables definitely.
Second will you usually be querying the information you plan to put inthe second table most of the time you query the first table? Then one table might make more sense. If you usually do not need the extended information, a separate ( and less wide) table should improve performance on the main data queries.
Third, how strong a possibility is it that you will ever need multiple values? If it is one to one nopw, but something like email address or phone number that has a strong possibility of morphing into multiple rows, go ahead and make it a related table. If you know there is no chance or only a small chance, then it is OK to keep it one assuming the table isn't too wide.
EAV tables look like they are nice and will save futue work, but in reality they don't. Genreally if you need to add another type, you need to do future work to adjust quesries etc. Writing a script to add a column takes all of five minutes, the other work will need to be there regarless of the structure. EAV tables are also very hard to query when you don;t know how many records you wil need to pull becasue normally you want them on one line and will get the information by joining to the same table multiple times. This causes performance problmes and locking especially if this table is central to your design. Don't use this method.
It depends if you will ever need more information about a company. If you notice yourself adding fields like companyphonenumber1 companyphonenumber2, etc etc. Then method 2 is better as you would seperate your entities and just reference a company id. If you do not plan to make these changes and you feel that this table will never change then method 1 is fine.
Usually, if you don't have data duplication then a single table is fine.
In your case you don't so the first method is OK.
I use one table if I estimate the data from the "second" table will be used in more than 50% of my queries. Use two tables if I need multiple copies of the data (i.e. multiple phone numbers, email addresses, etc)

Database design query

I'm trying to work out a sensible approach for designing a database where I need to store a wide range of continuously changing information about pets. The categories of data can be broken down into, for example, behaviour, illness etc. Data will be submitted on a regular basis relating to these categories, so i need to find a good way to design the db to efficiently accommodate this. A simple approach would just to store multiple records for each pet within each relevant table - e.g the behaviour table would store the behaviour data and would simply have a timestamp for each record along with the identifier for that pet. When querying the db, it would be straightforward to query the one table with the pet id, using the timestamps to output the correct history of submissions. Is there a more sensible way around this or does that make sense?
I would use a combination of lookup tables with a strong use of foreign keys. I think what you are suggesting is very common. For example, get me all the reported illnesses for a particluar pet during this data range would look something like:
Select *
from table_illness
where table_illness.pet_id = <value>
and date between table_illness.start_date and table_illness.finish_date
You could do that for any of the tables. The lookup tables will be a link between, for example, table_illness.illness_type and illness_types.illness_type. The illness_types table is where you would store the details on the types of illnesses.
When designing a database you should build your tables to mimic real-life objects or concepts. So in that sense the design you suggest makes sense. Each pet should have its own record in a pet table which doesn't change. Changing information should then be placed into the appropriate table which has the pet's id. The time stamp method you suggest is probably what I would do -- unless of course this is for a vet or something. Then I'd create an appointment table with the date and connect the illness or behavior to the appointment as well.

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Resources