I've got a RDS database with a table containing a ton of data in several columns (some with geo spatial data) I want to search across. SQL queries and good covering indexes on this data is still far too slow to use for something like an AJAX type ahead suggestion field.
As such, I'm investigating options for search and came across Amazon CloudSearch (now powered by Apache Solr) and it seems to fit my needs. The problem is, I can't seem to find a way via the AWS console to import or provide data from RDS. Am I missing something? Other solutions like ElasticSearch have plugins like river to connect an transform MySQL data.
I know there are command line tools for uploading CSV and XML data into CloudSearch. So far the easiest thing I can find is to mysqldump table into CSV or XML format and manually load it with the CLI tools. Is this with some re-occuring cron job the best way to do get data?
As of 2014-06-17 this feature is not available on Amazon Cloudsearch.
I think AWS Data Pipeline can help. It works like a cron and you can program reoccuring jobs easily using this.
Ran into the same thing, it is only possible to pull directly from RDS if you are using noSQL and AWS's dynamoDB.
Looking into Elasticsearch after finding this out.
Related
Has anyone here tried pulling data from MaestroQA to Snowflake?
From MaestroQA to Snowflake there is a way, but I'm wondering if there's way the other way around, from Snowflake pulling MaestroQA data, without using any APIs.
In addition, trying to look for a way to automate this.
I tried looking for documentation and any threads online, but couldn't find one.
Below are documents/links I have seen so far, but this method is from MaestroQA pushing data to Snowflake.
https://help.maestroqa.com/en/articles/1982484-data-warehouse-table-overview
https://help.maestroqa.com/en/articles/1557390-push-qa-data-to-your-data-warehouse.
Snowflake can only load data from its internal/external stages. It has no capabilities to pull data from anywhere.
You'll either need to use a tool with ETL capabilities or write your own process in, for example, python.
We have an ad search website and all the searches are being done through entity framework directly querying the sql server database.
It was working very well when the database had around 1000 ads, but now it is reaching 300k and lots of users searching. The searches now are very slow (using raw sql didn't help much) and I was instructed to consider Elasticsearch.
I've been some tutorials and I get the idea of how it works now, but what I don't know is:
Should I stop using sql server to store the ads and start using Elasticsearch instead? What about all the other related data? Is Elasticsearch an alternative to sql server?
Each Ad has some related data stored in different tables, how would I load it to Elasticsearch? As a single json element?
I read a lot of "billions of data" handled by Elasticsearch, so I don't think I would have performance problems with 300k rows in it, correct?
Would anybody explain me better these questions?
1- You could still use it; you don't want to search over the complete database, rigth? Just over the ads. It works with a no-sql format, so it is very scalable. It also works with json's so you have an easy form to access it.
2- When indexing data, you should try to add the complete necessary data in the same document(sql row), which is a single json, but in a limited way. Storage is cheap, but computing time isn't.
To index your data, you could either use filebeat, a program a bit similar to logstash, or create your own solution like, making a program that reads data from your db, and then passes it to elasticsearch in bulks.
3- Correct, 300k rows is a small quantity, but it also depends on the memory from where you are hosting elasticsearch.
Hope this helps.
I want to switch my Rails project from Solr to Elastic Search (just for fun), but I'm not sure about the best approach to index the documents. Right now I'm using Resque (background job) for this task, but I've been digging about "rivers" on Elastic Search and they look promising.
Anyone who has experience on this topic can bring me some tips? performance results? scalability?
Thanks in advance
P.S: Although is just for fun at the moment, I have in mind to migrate from Solr to Elastic Search a larger project in production.
It's hard to understand your situation/concerns from your question. With elasticsearch, you either push data in, or use a river to pull them.
When you are pushing the data in, you're in control of how your feeder operates, how it processes documents, how the whole pipeline looks (gather data > language analysis > etc > index). Using a river may be a convenient way how to quickly pull some data into elasticsearch from a certain source (CouchDB, RDBMS), or to continuously pull data eg. from a RabbitMQ stream.
Since you're considering elasticsearch in a context of a Rails project, you'll probably try out theTire gem at some point. Supposing you're using an ActiveModel-compatible ORM (for SQL or NoSQL databases), importing is as easy as:
$ rake environment tire:import CLASS=MyClass
See the Tire documentation and the relevant Railscasts episode for more information.
I need to fetch data from normalized MSSQL db and feed them in Solr index.
I was just wondering whether Apatar can be used to perform the job. I've gone through its documents, but doesn't get the information I'm looking for. It states, it can fetch data from SQL server, and post it over HTTP, but still not sure, whether it can post fetched data in XML over http or not?
Any advise will be highly valuable. thank you
I am not familiar with Apatar, but seeing as it is a Java application, it may be a bit challenging to implement it in a windows environment. However, for various scenarios where I need to fetch data from a MSSQL Database and feed it to Solr, I have written custom C# code leveraging the SolrNet client. This tends to be pretty straight forward and simple code and in the cases where we need to load data at specified intervals we are using scheduled tasks calling a console application. I would recommend checking out the Create/Update section of the SolrNet site for some examples of loading/updating data with the .Net client.
where do I find a howto to set up elasticSearch using Postgres?
My field sizes will be about 350mb, yes, MB, each in size. I have a
text output of all of the US Code and all decisions from all the courts,
the Statutes at Large, pretty much everything you would find in a library,
and I need to be able to do full text searches and return the exact point
in the field to the app to return the exact page in PDF form. Postgres
can easily handle the datastore, but I've never used elasticSearch and
have no idea of how it integrates into the indexing, etc.
As of 2015, there's ZomboDB (https://github.com/zombodb/zombodb). As the author, I'm a bit biased, but it's quite powerful. ;)
It's a Postgres extension and Elasticsearch plugin that allows you to "CREATE INDEX"s that use a remote Elasticsearch cluster, and it exposes a fairly powerful query language for performing full-text searches.
Because it's an actual index in Postgres, the ES cluster is automatically synchronized as you INSERT/UPDATE/DELETE records. As such, there's no need for asynchronous synchronization processes.
Additionally, because it's an actual index, it is transaction-safe, which means concurrent Postgres sessions will only see results that are consistent with their current transaction.
Here's a link to ZomboDB's tutorial. It should give you an idea of how easy ZomboDB is to use.
There is an application that you can use to import SQL Server, Oracle, Postgresql MySQL, etc. in to an ElasticSearch index.
http://code.google.com/p/ogr2elasticsearch/
Please let me know if you have any trouble building or using it. ~Adam
You can explore using pgsync.
PGSync is an open-source middleware (written in python) for syncing data from Postgres to Elasticsearch effortlessly. It allows you to keep Postgres as your source of truth and expose structured denormalized documents in Elasticsearch.
Githib link: https://github.com/toluaina/pgsync
Its possible to insert/update/delete postgres data in elasticsearch without middle ware other than the pgsql_http extension. Using triggers you can get a pretty much real-time index update.
You can also query elasticsearch and use the results within postgres to do joins etc with other tables/data in your database.
See the elasticsearch examples: https://github.com/sysadminmike/pgsql-http_examples