I am new to BigQuery and not familiar with dealing with tables that have repeated rows. I am aware that in Standard SQL in BigQuery that it will automatically handle the repeated rows, and I have seen how that works. However, I am trying to bring the data into a visualization tool that isn't able to handle tables with repeated rows.
What I am trying to do is run a query that is going to save the data in a view with the data flatten (legacy language)/unnested (standard language). All I am trying to include in the result of the query is the fields:
id
itemizations (this is the repeated field)
Could you give me some input on how to accomplish this or what other information you might need to help provide an answer to this? Thank you in advance.
Related
I am currently working with java spring and postgres.
I have a query on a table, many filters can be applied to the query and each filter needs many joins.
This query is very slow, due to the number of joins that must be performed, also because there are many elements in the table.
Foreign keys and indexes are correctly created.
I know one approach could be to keep duplicate information to avoid doing the joins. By this I mean creating a new table called infoSearch and keeping it updated via triggers. At the time of the query, perform search operations on said table. This way I would do just one join.
But I have some doubts:
What is the best approach in postgres to save item list flat?
I know there is a json datatype, could I use this to hold the information needed for the search and use jsonPath? is this performant with lists?
I also greatly appreciate any advice on another approach that can be used to fix this.
Is there any software that can be used to make this more efficient?
I'm wondering if it wouldn't be more performant to move to another style of database, like graph based. At this point the only problem I have is with this specific table, the rest of the problem is simple queries that adapt very well to relational bases.
Is there any scaling stat based on ratios and number of items which base to choose from?
Denormalization is a tried and true way to speed up queries/reports/searching processes for relational databases. It uses a standard time vs space tradeoff to reduce the time of query, at the cost of duplicating the data and increasing write/insert time.
There are third party tools that are specifically designed for this use-case, including search tools (like ElasticSearch, Solr, etc) and other document-centric databases. Graph databases are probably not useful in this context. They are focused on traversing relationships, not broad searches.
In the application I am working on, we have data grids that have the capability to display custom views of the data. As a point of reference, we modeled this feature using the concept of views as they exist in SharePoint.
The custom views should have the following capabilities:
Be able to define which subset of columns (of those that are
available) should be displayed in the view.
Be able to define one or
more filters for retrieving data. These filters are not constrained
to use only the columns that are in the result set but must use one
of the available columns. Standard logical conditions and operators
apply to these filters. For example, ColumnA Equals Value1 or
ColumnB >= Value2.
Be able to define a set of columns that the data will be sorted by. This set of columns can be one or more columns
from the set of columns that will be returned in the result set.
Be
able to define a set of columns that the data will be grouped by.
This set of columns can be one or more columns from the set of
columns that will be returned in the result set.
I have application code that will dynamically generate the necessary SQL to retrieve the appropriate set of data. However, it appears to perform poorly. When I run across a poorly performing query, my first thought is to determine where indexes might help. The problem here is that I won't necessarily know which indexes need to be created as the underlying query could retrieve data in many different ways.
Essentially, the SQL that is currently being used does the following:
Creates a temporary table variable to hold the filtered data. This table contains a column for each column that should be returned in the result set.
Inserts data that matches the filter into the table variable.
Queries the table variable to determine the total number of rows of data.
If requested, determines the grouping values of the data in the table variable using the specified grouping columns.
Returns the requested page of the requested page size of data from the table variable, sorted by any specified sort columns.
My question is what are some ways that I may improve this process? For example, one idea I had was to have my table variable only contain the columns of data that are used to group and sort and then join in the source table at the end to get the rest of the displayed data. I am not sure if this would make any difference which is the reason for this post.
I need to support versions 2014, 2016 and 2017 of SQL Server in addition to SQL Azure. Essentially, I will not be able to use a specific feature of an edition of SQL Server unless that feature is available in all of the aforementioned platforms.
(This is not really an "answer" - I just can't add comments yet because my reputation score isn't high enough yet.)
I think your general approach is fine - essentially you are making a GUI generator for SQL. However a few things:
This type of feature is best suited for a warehouse or read only replica database. Do not build this on a live production transactional database. There are permutations that you haven't thought of that your users will find that will kill your database (it's also true from a warehouse standpoint, but they usually don't have response time expectations as a transactional database)
The method you described for doing paging is not efficient from a database standpoint. You are essentially querying, filtering, grouping, and sorting the same exact dataset multiple times just to cherry pick a few rows each time. If you have the data cached, that might be ok, but you shouldn't make that assumption. If you have the know how, figure out how to snapshot the entire final data set with an extra column to keep the data physically sorted in the order the user requested. That way you can quickly query the results for your paging.
If you have a Repository/DAL layer, design your solution so that in the future certain combinations of tables/columns can utilize hardcoded queries/stored procedures. There will inevitably be certain queries that pop up that cause you performance issues and you may have to build a custom solution for specific queries in order to get the desired performance that can't be obtained by your dynamic sql
We're starting to implement Unicode as we've added some international customers. There are some issues comparing character data in SSIS because of capitals, accents, and other data problems.
I've thought that the Fuzzy logic lookup could be a good solution. However, when testing this solution out, I realized that in a lot of our existing code we limit what data to process, and send in those values by parameters.
I've noticed that in the Fuzzy Lookup, I can specify the name of the table, but I can't make changes like remove a % from a field and turn it into a decimal. Any ideas how we can setup the lookup with calculated fields?
Thanks!
Create a view in your database with the proper transformation your require using a sql query.
I'm currently writing some code for one of my classes involving distributed and parallel database processing. I'm doing horizontal fragmentation on some data and required to keep track of different pieces of data.
The professor recommends storing "metadata" to keep track of some basic computations. Is this as simple as creating another table and storing some basic information, or is there a much more efficient way of doing this?
Example:
I need to track ranges for min/max values of every table in my database. Should I store that information in an entirely new table or is there a better way of achieving this?
Example: I need to track ranges for min/max values of every table in my database. Should I store that information in an entirely new table or is there a better way of achieving this?
Yes, you should store min/max in a different table. Depending on your application, you might need more than one of those kinds of tables.
Each insert, update, or delete statement can change either or both of those values. Think about how you want to handle that. (Triggers, probably.)
Terminology
Metadata just means "data about other data", and min/max values for one or more columns in each table is arguably data about other data. But I've never seen such data called metadata. It's always either summary or aggregate data.
I think you'll find that when most DBAs and database developers use metadata, they're talking about system tables or the information_schema views that are built on top of system tables.
I'm a newbie to app development. I'm using Xcode 4.3.2. I'm attempting to develop an app using a tab bar with a table view. In the table view I need to list about 100 cities and info about those 100 cities when the user selects one. Basically, I already have that data about the cities in a Excel spreadsheet.
I can't really find good examples of what I want to achieve. I've heard the terms parsing XML, SQLite, Core Data, database, etc, and I'm not sure if that is what I need to do.
I'd thankfully accept any suggestions.
If the data in the table are changing or edited, then by using a database, you will avoid rolling a new patch with those minor changes (you just change the values in the db)
If the data is the same and won't change for a long time and you plan to patch the application, then you just need a source for that data (the spreadsheet)
For parsing the data, you can use anything, when taking about showing 100 cities, it depends how big the total data you will be querying, how fast it needs to be and you just need to benchmark it.
If you are querying about 500k records and you need to do some 'figuring out' and it takes too long to load. Then, transforming your data into xml then parsing it may give you better performance.
You have to at least design your way into what you want to achieve. Check the performance and tweak it to find the decent spot.
Right now I look at it as tackling an unknown problem. Spend some time and build something. This will help you see the potential problems better.
While databases are good, for a few hundred elements you can tolerate inefficiency. If your existing data are in an Excel spreadsheet, the easiest way to get them into your app is to export the Excel spreadsheet to Comma-Separated-Values (CSV), then make your app read CSV files. (If your Excel spreadsheet has multiple worksheets, you'll need to convert each separately.)
How do you parse CSV? See iPhone : How to convert CSV format into NSData or NSString?
You'll end up with arrays of arrays of NSString. You'll probably need to define a new class for your city data, and convert each row in the imported data to one city element.
If you need to know more, posting a few rows from your spreadsheet may help.