Does Solr data import handler support custom variables? - solr

I currently have an issue with my data import handler where ${dataimporter.last_index_time} is not granular enough to capture two events that happen within a second of each other, leading to issues where a record is skipped over in my database.
I am thinking to replace last_index_time with a simple atomically incrementing value as opposed to a datetime, but in order to do that I need to be able to set and read custom variables through solr that can be referenced in my data-config.xml file.
Alternatively, if I could find some way to set dataimporter.last_index_time, that would work just as well as I could ensure that the last_index_time is less than the newly-committed rows (and more importantly, that it is set from the same clock).
Does Solr support this?

Short answer: Yes it does
Long answer:
At work I'm passing parameters in request (DataImportHandler: Accessing request parameters) with default values set in handler (solrconfig.xml)
To sum up:
You can do use something like that in data-config.xml
${dataimporter.request.your_variable}
With request:
/dataimport&command=delta-import&clean=false&commit=true&your_variable=123

Related

Convert FileTime to DateTime in Azure Logic App

I'm pretty new to Logic App so still learning my way around custom expressions. One thing I cannot seem to figure out is how to convert a FileTime value to a DateTime value.
FileTime value example: 133197984000000000
I don't have a desired output format as long as Logic App can understand that this is a DateTime value and can be able to run before/after date logic.
To achieve your requirement, I have converted the Windows file Time to Unix File Time then converted to File time by add them as seconds to a default date 1970-01-01T00:00:00Z. Here is the Official documentation that I followed. Below is the expression that worked for me.
addSeconds('1970-01-01T00:00:00Z', div(sub(133197984000000000,116444736000000000),10000000))
Results:
This isn't likely to float your boat but the Advanced Data Operations connector can do it for you.
The unfortunate piece of the puzzle is that (at this stage) it doesn't just work as is but be rest assured that this functionality is coming.
Meaning, you need to do some trickery if you want to use it to do what you want.
By this I mean, if you use the Xml to Json operation, you can use the built in functions that come with the conversion to do it for you.
This is an example of what I mean ...
You can see that I have constructed some XML that is then passed into the Data parameter. That XML contains your Windows file time value.
I have then setup the Map Object to then take that value and use the built in ado function FromWindowsFileTime to convert it to a date time value.
The Primary Loop at Element is the XPath query that will make the selection to return the relevant values to loop over.
The result is this ...
Disclaimer: I should point out, this is due to drop in preview sometime in the middle of Jan 2023.
They have another operation in development that will allow you to do this a lot easier but for now, this is your easier and cheapest option.
This kind of thing is also available in the Transform and Expert operations but that's the next tier level of pricing.

import old data from postgres to elasticsearch

I have a lot of data in my postgres database( on a remote). This is the data of the past 1 year, and I want to push it to elasticsearch now.
The data has a time field in it in this format 2016-09-07 19:26:36.817039+00.
I want this to be the timefield(#timestamp) in elasticsearch. So that I can view it in kibana, and see some visualizations over the last year.
I need help on how do I push all this data efficiently. I cannot get that how do I get all this data from postgres.
I know we can inject data via jdbc plugin, but I think I cannot create my #timestamp field with that.
I also know about zombodb but not sure if that also gives me feature to give my own timefield.
Also, the data is in bulk, so I am looking for an efficient solution
I need help on how I can do this. So, suggestions are welcome.
I know we can inject data via jdbc plugin, but I think I cannot create
my #timestamp field with that.
This should be doable with Logstash. The first starting point should probably be this blog post. And remember that Logstash always consists of 3 parts:
Input: JDBC input. If you only need to import once, skip the schedule otherwise set the right timing in cron syntax.
Filter: This one is not part of the blog post. You will need to use the Date filter to set the right #timestamp value — adding an example at the end.
Output: This is simply the Elasticsearch output.
This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this:
date {
match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
remove_field => "your_date_field" # Remove now redundant field, since we're storing it in #timestamp (the default target of date)
}
If you're concerned with the performance:
You will need to set the right jdbc_fetch_size.
Elasticsearch output is batched by default.

How to delete a single value from graphite's whisper data?

I need to delete selected values from a graphite whisper data set. It is possible to overwrite a single value just by sending a new value, or to delete the whole set by deleting the .wsp file, but what I need to do is delete just one (or several) selected values, ie reset them to the same state as if they had not been written (undefined, graphite returns nulls). Overwriting doesn't do that.
How to do it? (Programmatically is ok)
See also:
How to cleanup the graphite whisper's data?
Removing spikes from Graphite due to erroneous data
Graphite (whisper) usually ships with whisper-update utility
You can use it to modify the content of a wsp file:
whisper-update.py [options] path timestamp:value [timestamp:value]*
If the timestamp you want to modify is recent (as defined by carbon), you may want to wait or shutdown your carbon-cache daemons.

How to read database (JDBC Request) result into variables in Jmeter

I want to read Database result into variables so I can use it for later requests.
How can i do it?
What if i want to return from database multiple
columns, or even rows? can loop the returned table same way i can
with "CSV Data Set Config"?
--edit--
Ok, i found this solution that uses regular expression to parse the response, but this solution and other like it doesn't work for me, because they require me to change SQL queries so Jmeter could parse them more "easily". I'm using Jmeter to do testing (load testing), and the last thing I want is to maintain 2 different codes, one for "testing" and other for "runtime".
Is there a "specific" JDBC Request solution that enable me to read result into variables using the concept of result-sets and columns?
Using The Regular Expression shouldn't affect what your SQL statement looks like. If you need to modify which part of the response you store in variable, use a Beanshell sampler with java code to parse out the response and store into a variable.
You can loop through the returned table, by using a FOREACH controller, referencing the variable name in the reg ex. Make sure in your reg ex, you set the match value to -1 to capture every possible match.

How do I store a signature block, including formatting, in a Sql server table?

I've been assigned the task of creating a table that stores an email signature for each username. The question is, how should I store the signature block? I could use a regular varchar type, but then how do I store the formatting metadata?
Any ideas or suggestions would be welcome.
Thanks!
Another idea I had was that you could design a specific email signature template, and then let people specify fields, such as Username, quote, avatar, alignment etc, and then have them modify their signature in a "signature editor". This way you could just store the "data" and not the rendering. so you could store something like follows:
<signature>
<username>chama</username>
<avatar href="http://url to my image"/>
<quote>A bird in the hand is not in the nest</quote>
</signature>
and it could look something like:
Chama
A bird in the hand is not in the nest
use varchar(max), or whatever length limit is appropriate.
otherwise, the only real concern is that you might want to make sure the html is html-encoded before you stick it in the database. (i.e., replace < with <, etc.) Not sure what you're using, but some tools have a setting so you don't have to do it manually.
other things you can do besides / in addition to html-encoding
1) restrict the formatting tags to some pre-defined set (i.e., search/replace tags you don't want before doing the insert. You can manage this in your db stored procedure, or better yet, in your front-end (if you have control over that).
2) disqualify attempts to insert data if they include certain tags (like '<script>', etc.)
HTML, RTF, XML. The stanard choices are multiple.
Note: "email signature" is NOT "digital signature". The term digital signature has a specific meaning and means a SIGNATURE to make sure - for email - it comes from th real sender and has not been tampered with.
I'd suggest going with your initial thought -- varchar(max). This will allow you to store signatures that are ASCII based. This includes plaintext, RTF or HTML signatures.
If users want to embed images (i.e. not a link to an image), then you'd have to determine a way for the caller to convert those images to Base64 or other before storing and after reading from your table.
Based on what I'm finding, you have basically two options:
1) Convert your formatted signature data to Binary and store it as a BLOB.
2) Instead of saving the signature itself in the DB, save them as files somewhere and store a reference to that file location in the DB.

Resources