I'm implementing my very first Solr based search application. The application is currently using a database server and local files (txt, xml) as data sources.
I was wondering if it's possible to display the source of a document on display. Is it possible to say for example: Result1 from 1.txt, Result2 from database ... etc ...?
You can create extra field, for example source, which could hold that information. This field can be populated straight from data importer making it pretty straight forward.
Related
The "requirements" are something like this: as a user I want to upload a dataset in CSV, JSON, or other supported text formats and be able to do basic REST queries against it such as selecting all first names in the dataset or select the first 10 rows.
I'm struggling to think of the "best" way to store this data. While, off the bat I don't think this will generate millions of datasets, it seems generally bad to create a new table for every dataset for a user as, eventually, I would hit the inode limit. I could store as flat files in something like S3 that's cached but then it still does require opening and parsing the file to query it.
Is this a use case for the JSON type in Postgres? If not, what would be the "right" format and place to store this data?
I want export some data from opentsdb,then import it into DolphinDB.
In opentsdb, the metrics are device_id,ssid, the tags are battery_level,battery_status,battery_temperature,bssid,cpu_avg_1min,cpu_avg_5min,cpu_avg_15min,mem_free,mem_used and rssi.
In DolphinDB , I create a table as bellow,
COLS_READINGS = `time`device_id`battery_level`battery_status`battery_temperature`bssid`cpu_avg_1min`cpu_avg_5min`cpu_avg_15min`mem_free`mem_used`rssi`ssid
TYPES_READINGS = `DATETIME`SYMBOL`INT`SYMBOL`DOUBLE`SYMBOL`DOUBLE`DOUBLE`DOUBLE`LONG`LONG`SHORT`SYMBOL
schema_readings = table(COLS_READINGS, TYPES_READINGS)
I find that the csv text file can import into DolphinDB, but I don't know how to export data to csv text file in Opentsdb. Is there a easy way to finish this work?
Assuming you're using an HBase backend, the easiest way would be to access that directly. The OpenTSDB schema describes in detail how to get the data you need.
The data is stored in one big table, but to save space, all metric names, tag keys and tag values are referenced using UIDs. These UIDs can be looked up in the UID table which stores that mapping in both directions.
You can write a small exporter in a language of your choice. The OpenTSDB code comes with an HBase client library, asynchbase, and has some tools to parse the raw data in its Internal class which can make it a bit easier.
I have data in two formates CSV and TEXT.
1) CSV file contains metadata. i.e. ModifyScore, Size, fileName etc.
2) actual text are in Text folders having files like a.txt, b.txt etc.
Please is it possible to index such data in Solr in a single core through DIH or another possible way?
According to your use case I would proceed with a custom indexing app.
Apparently you want to build your Solr document fetching some field from the CSV and some other field( the content) from the TXT .
Using Java for example, it is going to be quite simple :
You can use SolrJ, fetch the data from the CSV and TXT, build each Solr Document and then index it.
I would use the DIH if I can move the data in a DB ( even 2 tables are fine, as DIH supports joins).
Out of the box, you may be interested in using the script [1] transformer.
Using it in combination with your different data sources could work.
You need to play a little bit with it as it's not a direct solution to your problem.
[1] https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheScriptTransformer
Just to mention a couple more possibilities:
Use DIH to index txt files into collectionA, and use /update handler to ingest csv directly into collectionB, then use Streaming Expressions to merge both into a third collection that is the one you want to keep. The main advantage is everything is in Solr, no external code.
Use DIH to index files (or /update to index csv) and write a Update Request Processor that will intercept docs before they are indexed, that looks up the info from the other source, and adds it to the doc.
Yes it is possible for information and code, how to index data from multiple heterogeneous data sources see why the tikaEntityProcesor does not index the Text field in the following data-config file?
I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.
We have a customer requesting data in XML format. Normally this is not required as we usually just hand off an Access database or csv files and that is sufficient. However in this case I need to automate the exporting of proper XML from a dozen tables.
If I can do it out of SQL Server 2005, that would be preferred. However I can't for the life of me find a way to do this. I can dump out raw xml data but this is just a tag per row with attribute values. We need something that represents the structure of the tables. Access has an export in xml format that meets our needs. However I'm not sure how this can be automated. It doesn't appear to be available in any way through SQL so I'm trying to track down the necessary code to export the XML through a macro or vbscript.
Any suggestions?
Look into using FOR XML AUTO. Depending on your requirements, you might need to use EXPLICIT.
As a quick example:
SELECT
*
FROM
Customers
INNER JOIN Orders ON Orders.CustID = Customers.CustID
FOR XML AUTO
This will generate a nested XML document with the orders inside the customers. You could then use SSIS to export that out into a file pretty easily I would think. I haven't tried it myself though.
If you want a document instead of a fragment, you'll probably need a two-part solution. However, both parts could be done in SQL Server.
It looks from the comments on Tom's entry like you found the ELEMENTS argument, so you're getting the fields as child elements rather than attributes. You'll still end up with a fragment, though, because you won't get a root node.
There are different ways you could handle this. SQL Server provides a method for using XSLT to transform XML documents, so you could create an XSL stylesheet to wrap the result of your query in a root element. You could also add anything else the customer's schema requires (assuming they have one).
If you wanted to leave some fields as attributes and make others elements, you could also use XSLT to move those fields, so you might end up with something like this:
<customer id="204">
<firstname>John</firstname>
<lastname>Public</lastname>
</customer>
There's an outline here of a macro used to export data from an access db to an xml file, which may be of some use to you.
Const acExportTable = 0
Set objAccess = CreateObject("Access.Application")
objAccess.OpenCurrentDatabase "C:\Scripts\Test.mdb"
'Export the table "Inventory" to test.xml
objAccess.ExportXML acExportTable,"Inventory","c:\scripts\test.xml"
The easiest way to do this that I can think of would be to create a small app to do it for you. You could do it as a basic WinForm and then just make use of a LinqToSql dbml class to represent your database. Most of the time you can just serialize those objects using XmlSerializer namespace. Occasionally it is more difficult than that depending on the complexity of your database. Check out this post for some detailed info on LinqToSql and Xml Serialization:
http://www.west-wind.com/Weblog/posts/147218.aspx
Hope that helps.