SQL Server String Manipulation for URLs? - sql-server

I need to append a paramter-value 'xval=9' to all non-blank SQL server column values in a multi-million row table. The column contains URLs and they have a random amount of "querystring" parameters appended to the column. So when I append, I may need to append '?xval=9' or I may need to append '&val=9', depending on if parameters already exist.
So the URL values could like like any of these:
http://example.com/example
http://example.com/example/?aval=1
http://example.com/example/?aval=1&bval=2
http://example.com/example/index.html?aval=1&bval=2
'aval' and 'bval' are just samples, really any kind of key/value pair might be on the end of the URL.
What is the smartest pure-TSQL way to manipulate that, hopefully utilizing some kind of indexing?
Thanks.

Do that on Presentation or Model layer, not on Data layer.
i.e. read all data and manipulate using C# or other language you use.

Maybe this should work
SELECT CASE CHARINDEX('?', Url) WHEN 0 THEN Url+'?foo=boo' ELSE Url+'&foo=boo' END AS Url FROM Whatever

Related

import old data from postgres to elasticsearch

I have a lot of data in my postgres database( on a remote). This is the data of the past 1 year, and I want to push it to elasticsearch now.
The data has a time field in it in this format 2016-09-07 19:26:36.817039+00.
I want this to be the timefield(#timestamp) in elasticsearch. So that I can view it in kibana, and see some visualizations over the last year.
I need help on how do I push all this data efficiently. I cannot get that how do I get all this data from postgres.
I know we can inject data via jdbc plugin, but I think I cannot create my #timestamp field with that.
I also know about zombodb but not sure if that also gives me feature to give my own timefield.
Also, the data is in bulk, so I am looking for an efficient solution
I need help on how I can do this. So, suggestions are welcome.
I know we can inject data via jdbc plugin, but I think I cannot create
my #timestamp field with that.
This should be doable with Logstash. The first starting point should probably be this blog post. And remember that Logstash always consists of 3 parts:
Input: JDBC input. If you only need to import once, skip the schedule otherwise set the right timing in cron syntax.
Filter: This one is not part of the blog post. You will need to use the Date filter to set the right #timestamp value — adding an example at the end.
Output: This is simply the Elasticsearch output.
This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this:
date {
match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
remove_field => "your_date_field" # Remove now redundant field, since we're storing it in #timestamp (the default target of date)
}
If you're concerned with the performance:
You will need to set the right jdbc_fetch_size.
Elasticsearch output is batched by default.

SQL Server validating postcodes

I have a table containing postcodes but there is no validation built in to the entry form so there is no consistency in the way they are stored in the database, sample below:
ID Postcode
001742 B5
001745
001746
001748 DY3
001750
001751
001768 B276LL
001774 B339HY
001776 B339QY
001780 WR51DD
I want to use these postcode to map the distance from a central point but before I can do that I need to put them into a valid format and filter out any blanks or incomplete postcodes.
I had considered using
left(postcode,3) + ' ' + right(postcode,3)
To correct the formatting but this wouldn't work for postcodes like 'M6 8HD'
My aim is to get the list of postcodes in a valid format but I don't know how to account for different lengths of postcode. Is this there a way to do this in SQL Server?
As discussed in the comments, sometimes looking at a problem the other way around presents a far simpler solution.
You have a list of arbitrary input provided by users, which frequently doesn't contain the correct spacing. You also have a list of valid postcodes which are correctly spaced.
You're trying to solve the problem of finding the correct place to insert spaces into your arbitrary inputs to make them match the list of valid codes, and this is extremely difficult to do in practice.
However, performing the opposite task - removing the spaces from the valid postcodes - is remarkably easy to do. So that is what I'd suggest doing.
In our most recent round of data modelling, we have modelled addresses with two postcode columns - PostCode containing the postcode as provided from whatever sources, and PostCodeNoSpace, a computed column which strips whitespace characters from PostCode. We use the latter column for e.g. searches based on user input. You may want to do something similar with your list of Valid postcodes, if you're keeping it around permanently - so that you can perform easy matches/lookups and then translate those matches back into a version that has spaces - which is actually a solution to the original question posed!

Remove Duplicate adjacent Sub-String from String in Microsoft SQL Server

I am using SQL Server 2008 and I have a column in a table, which has values like below. It basically shows departure and arrival information.
-->Heathrow/Dublin*Dublin/Heathrow
-->Gatwick/Liverpool*Liverpool/Carlisle *Carlisle/Gatwick
-->Heathrow/Dublin*Liverpool/Heathrow
(The 3rd example shown above is slightly different where the person did not depart from Dublin, instead departed from a Liverpool).
This makes the column too lengthy, and I want to remove only the adjacent duplicates, so the information can be shown like below:
-->Heathrow/Dublin/Heathrow
-->Gatwick/Liverpool/Carlisle/Gatwick
-->Heathrow/Dublin***Liverpool/Heathrow
So, this would still show the correct travel route, but omits only the contiguous duplicates. Also, in the 3rd case, since the departure and arrival information location is not the same, Iwould like to show it as ***.
I found a post here that removes all duplicates (Find and Remove Repeated Substrings) but this is slightly different from the solution that I need.
Could someone share any thoughts please?
The first step is to adapt the process defined in the following link so that it splits based on /:
T-SQL split string
This returns a table which you would then loop through checking if the value contains an *. In that case you would get the text values before and after the * and compare them. Use CHARINDEX to get the position of the *, and SUBSTRING to get the values before and after. Once you have those check both values and append to your output string accordingly.
So you have a database column that contains this text string? Is your concern to display the data to the user in a new format, or to update the data in your database table with a new value?
Do you have access to the original data from which this text string was built? It would probably be easier to re-create the string in the format you desire than it would be to edit the existing string programmatically.
If you don't have access to this data, it would probably be a lot simpler to update your data (or reformat it for display) if you do the string manipulation in a high-level language such as c# or java.
If you're reformatting it for display, write the string manipulation code in whatever language is appropriate, right before displaying it. If you're updating your table, you could write a program to process the table, reading each record, building the replacement string, and updating the record before moving on to the next one.
The bottom line is that T-SQL is just not a good language for doing this sort of string examination and manipulation. If you can build a fresh string from the original data, or do your manipulation in a high-level language, you'll have an easier job of it and end up with more maintainable code.
I wrote a code for the first example you gave. You still need to
improve it for the rest ...
DECLARE #STR VARCHAR(50)='Heathrow/Dublin*Dublin/Heathrow'
IF (SELECT SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)) =
(SELECT SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))))
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))+1,'')
END
ELSE
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)))),'***')
END

python encoding characters in jinja2

It's a similar one with one of my other questions. I try to solve all the side effects of the first one.
I have stored few non-ascii characters on my database. If I make few "encoding-decoding" stuffs, I managed to work with the database queries. But I have another problem.
If I use the
self.response.out.write(mystring)
in one of my entities ( looks like this -> u'\u0395\u03c0\u03b9\u03c3\u03c4\u03ae\u03bc\u03b5\u03c2')
I can see it without any problem. But, I have a javascript which create a graph and needs a list with those strings. If I pass the list to the javascript like it is from the database, the javascript doesn't work at all. If I use the
tag2 = tag.encode("utf-8")
for every entity on the list and then pass the new list, I see all the non-ascii characters like this one -> ÎÏιÏÏήμεÏ

How do I store a signature block, including formatting, in a Sql server table?

I've been assigned the task of creating a table that stores an email signature for each username. The question is, how should I store the signature block? I could use a regular varchar type, but then how do I store the formatting metadata?
Any ideas or suggestions would be welcome.
Thanks!
Another idea I had was that you could design a specific email signature template, and then let people specify fields, such as Username, quote, avatar, alignment etc, and then have them modify their signature in a "signature editor". This way you could just store the "data" and not the rendering. so you could store something like follows:
<signature>
<username>chama</username>
<avatar href="http://url to my image"/>
<quote>A bird in the hand is not in the nest</quote>
</signature>
and it could look something like:
Chama
A bird in the hand is not in the nest
use varchar(max), or whatever length limit is appropriate.
otherwise, the only real concern is that you might want to make sure the html is html-encoded before you stick it in the database. (i.e., replace < with <, etc.) Not sure what you're using, but some tools have a setting so you don't have to do it manually.
other things you can do besides / in addition to html-encoding
1) restrict the formatting tags to some pre-defined set (i.e., search/replace tags you don't want before doing the insert. You can manage this in your db stored procedure, or better yet, in your front-end (if you have control over that).
2) disqualify attempts to insert data if they include certain tags (like '<script>', etc.)
HTML, RTF, XML. The stanard choices are multiple.
Note: "email signature" is NOT "digital signature". The term digital signature has a specific meaning and means a SIGNATURE to make sure - for email - it comes from th real sender and has not been tampered with.
I'd suggest going with your initial thought -- varchar(max). This will allow you to store signatures that are ASCII based. This includes plaintext, RTF or HTML signatures.
If users want to embed images (i.e. not a link to an image), then you'd have to determine a way for the caller to convert those images to Base64 or other before storing and after reading from your table.
Based on what I'm finding, you have basically two options:
1) Convert your formatted signature data to Binary and store it as a BLOB.
2) Instead of saving the signature itself in the DB, save them as files somewhere and store a reference to that file location in the DB.

Resources