Types of NETEZZA Planners - netezza

What are the types of planners that we use in NETEZZA and what is their use?
Eg: FACTERAL_PLANNER
Can anyone explain or please provide any link to get to know about this.

I just found an unofficial blog post that mentions:
the Fact Relationship (factrel) planner,
the Snowflake planner,
the Star planner.
Here's a link to one document entitled Determining fact relations with fact relation planner and Snowflake planner but the IBM website won't let me view it even after logging in with my customer id. You may have more luck. The title suggests that the factrel planner uses one algorithm (absolute table size) for guessing which are your fact tables while the snowflake planner uses a different method (size relative to other tables).
I can't find anything relevant about the snowflake planner or the star planner in any of the official documents or training documents I have. Searches for snowflake or star always return things about schemas rather than planners.

Related

Is there a way to catalog data in snowflake?

Is there a way in snowflake to describe/catalog schema so it is easier to search? For example, column name is "s_id", which really represents "system identifier". Is there a way to specify that s_id is "system identifier" and user can search "system identifier", which would return s_id as column name.
Just about every object in Snowflake allows for a comment to be added. You could leverage the comment field to be more descriptive of the columns intent, which can then be searched in information_schema.columns. I've seen many Snowflake customers use the comments as a full-blown data dictionary. My only suggestion is to not use any apostrophe's in your comments, as they screw up your get_ddl() results.
Adding comments along with table and field can or may serve the purpose but if you are in a complex enterprise environment and a lot of groups are part of it, it will be hard to follow this approach for long. Either you must have single ownership for data models tool which promotes such changes to the snowflake. Managing such information via comment is error-prone and that's why snowflake has partners which help on the different development requirement.
If you go to their partner website (link), you will find a lot of companies that have good integration with SF and one of them is Alation (link). You can try it for free for 30 days and see if that serves your purpose of data cataloging or not.

will/can solr, elasticsearch and kibana out perform sql server's cube technologies?

This trio of products came up as an alternative to sql server for searching and presenting analytics over a survey based pattern of about 100 million data points. A survey pattern is basically questions x answers x forms x studies and in our case very qa oriented about how people did their jobs. About 7% of our data points cannot be quantified because they are comments.
So, can this community envision (perhaps provide a link to a success story) leveraging these products for slicing and dicing metrics (via drag and drop) along with comments over 100 million data points and out performing sql server? Our metrics can be $'s, scores, counts, hours depending on the question. We have at least two hierarchies, one over people and the other over depts. Both are temporal in that depending on the date, have different relationships (aka changing dimensions). In all there are about 90 dimensions for each data point depending on how you count the hierarchy levels.
You cant compare SQL engine and elasticsearch/solr.
It depends how you want to query it: join or not, full text search or not etc...
Like Thomas said, it depends. Depends on your data and how you want to query it. In general, for text oriented data then NoSQL will be better and provide more functionalities than SQL. However, if I understand correctly, only 7% of your data is text focused (the comments), so I assume the rest is structured.
In terms of performance, it depends what kind of text analysis you want to do and what kind of queries you're wanting to recreate. For example, joining is usually much simpler and quicker in SQL than its non-relational equivalent. You could set up a basic Solr instance, recreate some of your text related SQL queries in Solr SQL equivalents, and see how it performs on your data in comparison.
While overall, NoSQL is usually touted as better at scaling, it's highly dependent on your data and requirements as to which of the two would be better in certain situations.

Database choice for crawled page semantics

I'm not sure whether this question has already been asked in the past.
I'm writing a webcrawler, intended to extract information from multiple websites for promotions,prices and product descriptions.
Which database choice would be ideal to do an in memory comparison on the data of promotions and prices, based on identification of the same product from multiple websites.
I know the design is going to be complex for the Scraper, HTMLDataProcessor and Storage for wrangling. But, I'm looking for a solve for the data layer choice.
Appreciate the help on this.
I'd suggest first you create your object model or Entity relationship diagram for all the entities.(a.k.a ER diagram)
For instance you can see the tutorial here: http://creately.com/blog/diagrams/er-diagrams-tutorial/
Once you have the diagram and relationships between your entity then you can make a choice of whether you need relational database or not.
You need to answer question like:
Do you care about FK (foreign key) constraints?
What is the most common query and do you care about it's performance?
Is an in-memory database sufficient or do you need data to be persisted?
Think along those lines.

Cassandra data modeling for social network with follower and following actions

I want to know anybody can describe how should be Cassandra data modeling for a social network that allow its users to follow each other, and has timeline and some common features that are in social networks like Twitter.
I found twissandra on Github but that was confusing for me.
Please if you can describe how following and follower tables should be in Cassandra or provide links to tutorials
Despite relational databases schema design in which the queries that will be performed have a major impact only in the context of optimization, schema design for Cassandra is query oriented: this means that you need first to figure out the kind information you will ask for in order to be able to design an effective Cassandra instance. A wrong schema design can kill Cassandra's performances.
Therefore, regarding your question, you should first have a complete picture of your context, and then go back to the design phase.
I have personally found really useful the material provided by Datastax Academy. They are free, you will need only to register. I would suggest you to first take a look at the Cassandra system architecture if you are not familiar with it in order to fully understand the schema design choices, and then look at the main design principles.
Regarding the methodology to be used, I don't think there's an established one right now. I would suggest using the Chebotko Diagrams which are well explained in this article.

What is a good web application SQL Server data mart implementation in ElasticSearch?

Coming from a RDBMS background and trying to wrap my head around ElasticSearch data storage patterns...
Currently in SQL Server, we have a star schema data mart, RecordData. Rows are organized by user ID, geographic location that pertains to the rest of the searchable record, title and description (which are free text search fields).
I would like to move this over to ElasticSearch, and have read about creating a separate index per user. If I understand this correctly, with this suggestion, I would be creating a RecordData type in each user index, correct? What is a recommended naming convention for user indices that will be simple for Kibana analysis?
One issue I have with this recommendation is, how would you organize multiple web applications on the ES server? You wouldn't want to have all those user indices all over the place?
Is it so bad to have one index per application, and type per SQL Server table?
Since in SQL Server, we have other tables for user configuration, based on user ID's, I take it that I could then create new ES types in user indices for configuration. Is this a recommended pattern? I would rather not have two data base systems for this web application.
Suggestions welcome, thank you.
I went through the same thing, and there are a few things to take into account.
Data Modeling
You say you use a star schema today. Elasticsearch is typically appropriate for denormalized data where the totality of the information resides in each document unlike with a star schema. If you can live with denormalized, that is fine but I assume that since you already have star schema, denormalized data is not an option because you don't want to go and update millions of documents each time the location name change for example(if i understand the use case). At least in my use case that wasn't an option.
What are Elasticsearch options for normalized data?
This leads us to think of how to put star schema like data in a system like Elasticsearch. There are a few options in the documentation, the main ones i focused were
Nested Objects - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html . In nested objects the entire information is kept in a single document, meaning one location and its related users would be in a single document. That may make it not optimal becasue the document will be huge and again, a change in the location name will require to update the entire document. So this is better but still not optimal.
Parent - Child Relationship - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html . In this case the location and the User records would be kepts in separate indices similarly to a relational database. This seems to be the right modeling for what we need. The only major issue with this option is the fact that Kibana 4 does not provide ways to manipulate/aggregate documents based on parent/child relationship as of this writing. So if you main driver for using Elasticsearch is Kibana(this was mine), that kind of eliminates the option. If you want to benefit from the elasticsearch speed as an engine this seems to be the desired option for your use case.
In my opinion once you got right the data modeling all of your questions will be easier to answer.
Regarding the organization of the servers themselves, the way we organize that is by having a separate cluster of 3 elasticsearch nodes behind a Load Balancer(all of that is hosted on a cloud) and then have all your Web Applications connect to that cluster using the Elasticsearch API.
Hope that helps.

Resources