If we use salesforce translator need to use cache for translation results (not for query results) for speed up of translation of SQL request to SOQL. How to use this in TEIID in spring-boot-teiid application?
Related
There is a feature in Google Cloud Platform that allows reporting resource usages. This is cool because it supports two things I will make use of:
Time range (usage_start_time, usage_end_time).
Labels (labels_key, labels_value).
Usage data (usage_amount and more fields).
And, by doing this, I can make use of plain SQL queries over the dumped data (which goes to BigQuery) to report what I need.
But is there a way I can do this without using BigQuery? Like doing a real-time query? Concretely, solve this requirement:
Take a time range.
Take the service type I need (e.g. stating "Google Cloud Run Deployment").
Take a custom label (let's say I have a label named "customer" and its value "my-mother").
Ask the usage amounts and corresponding pricing values for each resource of that service type and labels criteria.
Weight-summarize the end value (the value would be expressed in US dollars).
Is there a way I can do that with some direct GCP Billing API and not by using SQL on BigQuery dumps? (this means: An existing function or class in the billing library or somewhere else - not sure but one that allows me to ask this query).
You can export billing data to Pub/Sub, Cloud Storage, or BigQuery. The Billing APIs do not provide query abilities.
Is there a way I can do that with some direct GCP Billing API and not
by using SQL on BigQuery dumps?
If you want those features without using BigQuery then you will need to create an export sink and load the data into a system that supports queries.
The BigQuery dataset exports are the source of truth for your resource consumption. This is similar to CUR of AWS.
If you'd like to access the GCP billing data over an API - the best way would be to connect your billing dataset to a cloud cost management platform and use their APIs to query your cost by day, week, month or services, or by labels.
You can check out Economize - generous free tier, connects with your GCP billing dataset and you can use the API to query your data.
As mentioned in the title, I'd like to know what data sources Snowflake supports. I'm not completely sure how to even approach this question. I know you can create an external stage in the cloud storage of supported cloud providers, but what if I want to load data from the Oracle database, for example. What's the best solution in that case, is it to use the ODBC driver, or?
And please feel free to give me any suggestions, or advice on where to continue my research. Also, let me know if any part of my question is unclear so that I can rephrase it :)
Snowflake natively supports AVRo, Parquet, CSV, JSON and ORC. These are landed in a stage for ingestion --- your ELT/ETL tool of choice or even a home-built application must land the data in a stage, either internal or external.
That file is then ingested into Snowflake utilizing a COPY command either automated by said tool or using something like Snowpipe.
We have documentation on Firehose / Kafka pipelines landing data for Snowpipe to ingest either through AUTO_INGEST notifications (limited to external stage) or calling our REST API.
All supported by our documentation, simply google the terms I have mentioned and there will be tons of documentation
Multiple existing ETL Tools allow to define Snowflake as destination, supporting a wide variety of sources.
Native Programmatic Interfaces
Snowflake Ecosystem - Data Integration
i wanna know what are the pros and cons using Fusion instead of regular Solr ? can you guys give some example (like some problem that can be solved easily using Fusion)?
First of all, I should disclose that I am the Product Manager for Lucidworks Fusion.
You seem to already be aware that Fusion works with Solr (or one or more Solr clusters or instances), using Solr for data storage and querying. The purpose of Fusion is to make it easier to use Solr, integrate Solr, and to build complex solutions that make use of Solr. Some of the things that Fusion provides that many people find helpful for this include:
Connectors and a connector framework. Bare Solr gives you a good API and the ability to push certain types of files at the command line. Fusion comes with several pre-built data source connectors that fetch data from various types of systems, process them as appropriate (including parsing, transformation, and field mapping), and sends the results to Solr. These connectors include common document stores (cloud and on-premise), relational databases, NoSQL data stores, HDFS, enterprise applications, and a very powerful and configurable web crawler.
Security integration. Solr does not have any authentication or authorizations (though as of version 5.2 this week, it does have a pluggable API and an basic implementation of Kerberos for authentication). Fusion wraps the Solr APIs with a secured version. Fusion has clean integrations into LDAP, Active Directory, and Kerberos for authentication. It also has a fine-grained authorizations model for mananging and configuring Fusion and Solr. And, the Fusion authorizations model can automatically link group memberships from LDAP/AD with access control lists from the Fusion Connectors data sources so that you get document-level access control mirrored from your source systems when you run search queries.
Pipelines processing model. Fusion provides a pipeline model with modular stages (in both API and GUI form) to make it easier to define and edit transformations of data and documents. It is analogous to unix shell pipes. For example, while indexing you can include stages to define mappings of fields, compute new fields, aggregate documents, pull in data from other sources, etc. before writing to Solr. When querying, you could do the same, along with transforming the query, running and returning the results of other analytics, and applying security filtering.
Admin GUI. Fusion has a web UI for viewing and configuring the above (as well as the base Solr config). We think this is convenient for people who want to use Solr, but don't use it regularly enough to remember how to use the APIs, config files, and command line tools.
Sophisticated search-based features: Using the pipelines model described above, Fusion includes (and make easy to use) some richer search-based components, including: Natural language processing and entity extraction modules; Real-time signals-driven relevancy adjustment. We intend to provide more of these in the future.
Analytics processing: Fusion includes and integrates Apache Spark for running deep analytics against data stored in Solr (or on its way in to Solr). While Solr implicitly includes certain data analytics capabilities, that is not its main purpose. We use Apache Spark to drive Fusion's signals extraction and relevancy tuning, and expect to expose APIs so users can easily run other processing there.
Other: many useful miscellaneous features like: dashboarding UI; basic search UI with manual relevancy tuning; easier monitoring; job management and scheduling; real-time alerting with email integration, and more.
A lot of the above can of course be built or written against Solr, without Fusion, but we think that providing these kinds of enterprise integrations will be valuable to many people.
Pros:
Connectors : Lucidworks provides you a wide range of connectors, with those you can connect to datasources and pull the data from there.
Reusability : In Lucidworks you can create pipelines for data ingestion and data retrieval. You can create pipelines with common logic so that these can be used in other pipelines.
Security : You can apply restrictions over data i.e Security Trimming data. Lucidworks provides in built query-pipeline stages for Security Trimming or you can write custom pipeline for your use case.
Troubleshooting : Lucidworks comes with discrete services i.e api, connectors, solr. You can troubleshoot any issue according the services, each service has its logs. Also you can configure JVM properties for each service
Support : Lucidworks support is available 24/7 for help. You can create support case according the severity and they schedule call for you.
Cons:
Not much, but it keeps you away from your normal development, you don't get much chance to open your IDE and start coding.
We are using the DevArt connector which pretends to be an ADO.NET connector to SFDC. It is super slow (13 minutes for some queries). What approach will return data the quickest?
And by any chance is their an OData API to SFDC that is fast?
There are a few APIs you can use:
The SOAP API -
CRUD operations and query (SOQL) support. Some metadata support. There are Enterprise and Partner variations. Can be added as a Web Service reference in Visual Studio.
The REST API
"Typically, the REST API operates on smaller numbers of records. You
can GET a single record using its URL and you can also run a query and
bring back a set of records that match that query." Salesforce APIs – What They Are & When to Use Them
The Bulk API
REST initiated batch processes that output XML or CSV data)
The Metadata API
Probably not applicable unless you are doing configuration or deployment style tasks
The Apex API
Again, not applicable unless you are working with Apex classes and running test cases.
The Streaming API
Allows you to register a query and get updates pushed to you when the query result changes.
They all have their advantages and disadvantages. There is a good summary in the Bulk API introduction.
At a guess I'd assume the DevArt connector is based on the SOAP API. The SOAP API can be fast, but it isn't an ideal way to bring back a very large number of records as the results are paged and the SOAP responses can be large. Other factors can also slow it down unnecessarily, such as querying fields that are never used.
The ADO.NET connector must be doing some interpretation of queries into SOQL. There may be joins that are inefficient when translated into SOQL.
I suspect the best solution will depend on what records and fields you are trying to query and how may results you are expecting to work with.
I work for a large organization with many different relational datastores containing overlapping information. We are looking for a solution for integrated querying on all our data at once. We are considering using Semantic Web technology for this purpose. Specifically, we plan to:
Create a unified ontology
Map each database to this ontology
Create a SPARQL endpoint for each database
Use a federation engine to unify them to one endpoint.
I am now in search of appropriate tools for the last stage of this plan. I have heard that Fuseki is appropriate for this case, but have been unable to find any relevant documentation.
Can you please give your opinion on the appropriateness of Fuseki for this task, or even better, point me at some proper documentation?
Thanks in advance!
Oren
http://jena.apache.org/
You want to read about Fuseki but also able SPARQL basic federated query. Fuseki itself is a query server, the query engine is ARQ.
This is the simplest example of a federated query, that is supported by the ARQ engine (the backend to fuseki):
https://jena.apache.org/documentation/query/service.html
You submit this query to your fuseki endpoint, and it will go off and query the endpoint in the "service <>" brackets. This certainly works for me, using ARQ 2.9.4 with fuseki 0.2.6