I work for a large organization with many different relational datastores containing overlapping information. We are looking for a solution for integrated querying on all our data at once. We are considering using Semantic Web technology for this purpose. Specifically, we plan to:
Create a unified ontology
Map each database to this ontology
Create a SPARQL endpoint for each database
Use a federation engine to unify them to one endpoint.
I am now in search of appropriate tools for the last stage of this plan. I have heard that Fuseki is appropriate for this case, but have been unable to find any relevant documentation.
Can you please give your opinion on the appropriateness of Fuseki for this task, or even better, point me at some proper documentation?
Thanks in advance!
Oren
http://jena.apache.org/
You want to read about Fuseki but also able SPARQL basic federated query. Fuseki itself is a query server, the query engine is ARQ.
This is the simplest example of a federated query, that is supported by the ARQ engine (the backend to fuseki):
https://jena.apache.org/documentation/query/service.html
You submit this query to your fuseki endpoint, and it will go off and query the endpoint in the "service <>" brackets. This certainly works for me, using ARQ 2.9.4 with fuseki 0.2.6
Related
i wanna know what are the pros and cons using Fusion instead of regular Solr ? can you guys give some example (like some problem that can be solved easily using Fusion)?
First of all, I should disclose that I am the Product Manager for Lucidworks Fusion.
You seem to already be aware that Fusion works with Solr (or one or more Solr clusters or instances), using Solr for data storage and querying. The purpose of Fusion is to make it easier to use Solr, integrate Solr, and to build complex solutions that make use of Solr. Some of the things that Fusion provides that many people find helpful for this include:
Connectors and a connector framework. Bare Solr gives you a good API and the ability to push certain types of files at the command line. Fusion comes with several pre-built data source connectors that fetch data from various types of systems, process them as appropriate (including parsing, transformation, and field mapping), and sends the results to Solr. These connectors include common document stores (cloud and on-premise), relational databases, NoSQL data stores, HDFS, enterprise applications, and a very powerful and configurable web crawler.
Security integration. Solr does not have any authentication or authorizations (though as of version 5.2 this week, it does have a pluggable API and an basic implementation of Kerberos for authentication). Fusion wraps the Solr APIs with a secured version. Fusion has clean integrations into LDAP, Active Directory, and Kerberos for authentication. It also has a fine-grained authorizations model for mananging and configuring Fusion and Solr. And, the Fusion authorizations model can automatically link group memberships from LDAP/AD with access control lists from the Fusion Connectors data sources so that you get document-level access control mirrored from your source systems when you run search queries.
Pipelines processing model. Fusion provides a pipeline model with modular stages (in both API and GUI form) to make it easier to define and edit transformations of data and documents. It is analogous to unix shell pipes. For example, while indexing you can include stages to define mappings of fields, compute new fields, aggregate documents, pull in data from other sources, etc. before writing to Solr. When querying, you could do the same, along with transforming the query, running and returning the results of other analytics, and applying security filtering.
Admin GUI. Fusion has a web UI for viewing and configuring the above (as well as the base Solr config). We think this is convenient for people who want to use Solr, but don't use it regularly enough to remember how to use the APIs, config files, and command line tools.
Sophisticated search-based features: Using the pipelines model described above, Fusion includes (and make easy to use) some richer search-based components, including: Natural language processing and entity extraction modules; Real-time signals-driven relevancy adjustment. We intend to provide more of these in the future.
Analytics processing: Fusion includes and integrates Apache Spark for running deep analytics against data stored in Solr (or on its way in to Solr). While Solr implicitly includes certain data analytics capabilities, that is not its main purpose. We use Apache Spark to drive Fusion's signals extraction and relevancy tuning, and expect to expose APIs so users can easily run other processing there.
Other: many useful miscellaneous features like: dashboarding UI; basic search UI with manual relevancy tuning; easier monitoring; job management and scheduling; real-time alerting with email integration, and more.
A lot of the above can of course be built or written against Solr, without Fusion, but we think that providing these kinds of enterprise integrations will be valuable to many people.
Pros:
Connectors : Lucidworks provides you a wide range of connectors, with those you can connect to datasources and pull the data from there.
Reusability : In Lucidworks you can create pipelines for data ingestion and data retrieval. You can create pipelines with common logic so that these can be used in other pipelines.
Security : You can apply restrictions over data i.e Security Trimming data. Lucidworks provides in built query-pipeline stages for Security Trimming or you can write custom pipeline for your use case.
Troubleshooting : Lucidworks comes with discrete services i.e api, connectors, solr. You can troubleshoot any issue according the services, each service has its logs. Also you can configure JVM properties for each service
Support : Lucidworks support is available 24/7 for help. You can create support case according the severity and they schedule call for you.
Cons:
Not much, but it keeps you away from your normal development, you don't get much chance to open your IDE and start coding.
With Amazon's Cloud Search being powered by Solr, I have certain questions before we proceed. May be someone who has experience with both can guide us.
How compatible Amazon's Cloud Search's API with Solr API ? Are they same, or radically different?
Is it compatible with queries being performed via Solrnet?
How different it is from Solr?
The reason we're asking is, we need to migrate one application from Solr to Amazon Cloud Search and before we proceed we need some idea on how does this work?
I checked with Amazon Cloud Search documentation, but unable to find any details on this particular thing!
Amazon cloudsearch is based on solr, so conceptually they work same way, but amazon has written its own wrapper on top of solr api. Answers to the questions in same order below:
Amazon cloud search has two endpoints, one for search and other for indexing documents. Provides with multiple ways of indexing like S3 documents, JSON, XML etc.
Amazon cloudsearch has four different query parsers, if you place queries using Lucence as query parser, query syntax and functionality is same.
The only difference I observed functionally is cloudsearch doesn't support hierarchy in fields data, it only provides with text and literal datatypes and their arrays for multiple values.
There are migration tools available like http://www.8kmiles.com/blog/apache-solr-to-amazon-cloudsearch-migration-tool/ which support migration from solr to cloudsearch.
How can I read multiple tables from salesforce?
Can I use groovy script? Or Salesforce connector provide to facility reading multiple tables??
Is there any other way ??
The SFDC connector supports Salesforce Query language (SOQL).
It's similar to SQL but it has it's limitations, JOINS are one of them.
SOQL does offers a way to query more than one table at the time by using relationship queries, it may not always be what you need but it get's you closer.
Please take a look at SFDC documentation, http://www.salesforce.com/us/developer/docs/officetoolkit/Content/sforce_api_calls_soql_relationships.htm
Again this language works with the SFDC connector so if you need more information about it you'll be better of asking in a SFDC forum ;)
HTH
you can use Native query language instead of Datasense query Language in your salesforce connector for query.
I know that there are several similar posts available discussing the same but I didn't find the answer for my case.
I have just basic idea of LDAP which we can get through google search like it is a directory database, used for hierarchical data and optimised for reads than writes. And of course LDAP is protocol to access the database.
A little background of problem:
We have to create a presence service(publish-subscribe) for which we
have to choose between a directory based DB and an RDBMS.
The DB will be on cloud so if RDBMS is chosen it will be exposed as
a Web Service and if a directory based is chosen it will be accessed
via LDAP.
Service is a pub-sub model where each user may be a publisher with many subscriber and it may itself be subscriber. So, it is mn relationship.*
Now, I have two questions regarding the same.
Can we model this in directory based database? I looked through the
schemas but could not figure out how to do that.
Second question is regarding the approach of accessing the data i.e.
using LDAP or using web service. I don't know what are the
advantages/disadvantages of using LDAP over usage of web service.
Appreciate any help.
Thanks
Can we model this in directory based database? I looked through the schemas but could not figure out how to do that.
There isn't an existing schema for this that I'm aware of. All the schemas I've ever seen are for X.500 directory information, not messages or transactions.
I wouldn't give LDAP a moment's consideration for a publish/subscribe system. I wouldn't use a database either. I would use a messaging system, e.g. MQ.
Second question is regarding the approach of accessing the data i.e. using LDAP or using web service. I don't know what are the advantages/disadvantages of using LDAP over usage of web service.
LDAP isn't suitable anyway, and making this decision won't make it any more suitable.
Are there any cloud hosting solutions for geospatial data? I am currently writing a directory style app where businesses can sign up and then users can find nearby ones.
I am considering Google App Engine for this, but from what I can tell the GeoModel code is quite expensive (up to tens of thousands of dollars a year) to run since Google updated the pricing of App Engine. It doesn't seem like App Engine's database is really suited to this kind of query (though the SQL solution may be an answer).
I was hoping to find a service where I could send off a HTTP request to add data (a business' id, name and icon url) to a database, and then another one to find a list of businesses that are nearby to a given point. A service is preferable as this is work done for a client and we would like the solution to be managed with as little interaction from us or the client needed as possible.
EDIT:
I just found cartodb.com which uses PostgreSQL and is reasonably priced. Are the any other alternatives?
The App Engine Search API (currently in Experimental) supports GeoPoints and geosearch, and is great for exactly the kind of query that you describe.
See the Google Developers Academy (GDA) App Engine Search API classes for a bit more info and an example as well.
http://www.iriscouch.com/ is a cloud-based host for CouchDB and they support the geocouch extensions for CouchDB to store geoJSON data and perform spatial queries.
We have decided to go with cartodb.com because it looks like they have a good price to ease of use ratio.
You mentioned going with CartoDB, which is a good choice with a nice UI.
Just adding, if you were just looking for a scalable backend, you could use StormDB. It is a cloud hosted SQL database with geospatial extensions. You data is automatically distributed amongst multiple nodes for write, read, and parallel query scalability.