How to convert Svmlight data to a database - database

I am using data in Svmlight format (you may know it as libsvm)
this is the format
I tried to create a SQL database to store it, the issue is that the format is sparse, and in a regular database it will be very consuming to store the data, and if i store it in a sparse format (string for every row) i can't query stuff by their column wise content (e.g -> i need to query for all the rows that contain value for feature #)
I am looking for a straight forward way to convert it to a database that will make filtering and querying faster and more simple.
Can anyone point me to a suiting database solution and if possible an already made utility that makes to conversion?
Thanks!

Related

Why can I not use format function while Jet-SQL to T-SQL?

I was trying to convert one of my MS Access query containing format to a SQL Server view. I have my view connected to MS Access as linked tables. I was looking at this MS Access to SQL server cheat-sheet to convert Jet-SQL to T-SQL.
The cheat sheet says:
Access: SELECT Format(Value, FormatSpecification) (note: this
always returns a string value)
T-SQL: Do not do this in T-SQL;
format data at your front-end application or report
I cannot format data at my front end because the SQL Server view is linked as linked tables. I cannot have format function in tables.
The cheat sheet does not provide any explanation on why not to do this in T-SQL.
What is the reason behind not using format when converting Jet-SQL to T-SQL?
Obviously, you can format values in T-SQL using the Format function, which only has minor differences with the Access format function.
Generally, though, you shouldn't.
There are multiple reasons why it's discouraged:
Formatted strings are nearly always larger than unformatted dates/numbers, causing additional overhead when transmitting results
If you format in the application layer, the unformatted value is available to you in the application layer to validate/do calculations with/use for conditional formatting/etc. If you format in the data in the database layer, you can't do this without casting back to a date (which is a really bad practice).
If you want variable formatting based on things like locale settings, it's way easier to format in the application layer.
It's certainly not a limitation of SQL Server. It's just a bad practice to use it.

SQL Server 2014 - Insert xml data into varbinary field

In my stored procedure, I am creating an XML file which has the potential to be very large, > 1GB in size. The data needs to be inserted into a varbinary column and I was wondering what the most efficient method of doing this is in SQL Server 2014?
I was storing it in an xml column but have been asked to move it to this new column as a result of a decision outside of my control
If you have the slightest chance to speak with these persons, you should do this!
You must be aware, that XML is not stored as the string representation you see, but as a hierarchically organized tree. Reading this data or manipulating it is astonishingly fast! If you store the XML as BLOB, you will keep it in its string format (hopefully this is unicode/UCS-2!). Reading this data will need a cast to NVARCHAR(MAX) and then to XML, which means a full parse of the whole document to get the hierarchy tree. When this is done, you can use XML data type methods like .value or .nodes). You will need this very expensive process over and over and over and ...
Especially in cases of huge XMLs (or - even worse - many of them) this is a really bad decision!! Why should one do this??? It will take roughly the same amount of storage space.
The only thing you will get is bad performance! And you will be the one who has to repair this later...
VARBINARY is the appropriate type for data, where you do not care what's inside (e.g. pictures). If these XMLs are just plain archive data and you do not want to read or manipulate them, this can be a choice. But there is no advantage at all!
I would look into using a File Table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
And Check out this for inserting blobs: How to insert a blob into a database using sql server management studio

Storing Serialized Information In SQL Server using F#

I am currently working on a project in F# that takes in data from Excel spreadsheets, determines if it is compatible with an existing table in SQL Server, and then adds the relevant rows to the existing table.
Some of the data I am working with is more specific than the types provided by T-SQL. That is, T-SQL has a type "date", but I need to distinguish between sets of dates that are at the beginning of each month or the end of each month. This same logic applies to many other types as well. If I have types:
Date(Beginning)
Date(End)
they will both be converted to the T-SQL type "date" before being added to the table, therefore erasing some of the more specific information.
In order to solve this problem, I am keeping a log of the serialized types in F#, along with which column number in the SQL Server table they apply to. My question is: is there any way to store this log somewhere internally in SQL Server so that I can access it and compare the serialized types of the incoming data to the serialized types of the data that already exists in the table before making new inserts?
Keeping metadata outside of the DB and maintaining them manually makes your DB "expensive" to manage plus increases the risk of errors that you might not even detect until something bad happens.
If you have control over the table schema, there are at least a couple of simple options. For example, you can add a column that stores the type info. For something simple with just a couple of possible values as you described, just add a new column to store the actual type value. Update the F# code to de-serialize the source into separate DATE and type (BEGINNING/END) values which are then inserted to the table. Simple, easy to maintain and easily consumed.
You could also create a user defined type for each date subtype but that can be confusing to another DBA/dev plus makes it more complicated when retrieving data from your application. This is generally not a good approach.
Yes, you can do that if you want to.

How to store XML result of WebService into SQL Server database?

We have got a .Net Client that calls a Webservice. We want to store the result in a SQL Server database.
I think we have two options here how to store the data, and I am a bit undecided as I can't see the pros and cons clearly: One would be to map the results into database fields. That would require us to have database fields corresponding to each possible result type, e.g. for each "normal" result type as well as those for faults.
On the other hand, we could store the resulting XML and query that via the SQL Server built in XML functions.
Personally, I am comfortable with dealing with both SQL and XML, so both look fine to me.
Are there any big pros and cons and what would I need to consider in terms of database design when trying to store the resulting XML for quite a few different possible Webservice operations? I was thinking about a result table for each operation that we call with different entries for the different possible outcomes / types and then store the XML in the right field, e.g. a fault in the fault field, a "normal" return type in the appropriate field etc.
We use a combination of both. XML for reference and detailed data, and text columns for fields you might search on. Searchable columns include order number, customer reference, ticket number. We just add them when we need them since you can extract them from the XML column.
I wouldn't recommend just the XML. If you store 10.000 messages a day, a query like:
select * from XmlLogging with (nolock) where Response like '%Order12%'
can become slow and interfere with other queries. You also can't display the logging in a GUI because retrieval is too slow.
I wouldn't recommend just the text columns either. If the XML format changes, you'd get an empty column. That's hard to troubleshoot without the XML message. In addition, if you need to "replay" the message stream, that's a lot easier with the XML messages. Few requirements demand replay, but it's really helpful when repairing the fallout of production problems.

storing long text

what is the best way to store long texts (articles) in a database? it doesnt need to be searchable.
i want to allow ppl to read the first chapter of every book in my bookstore. dumping it into a database field makes it difficult to style paragraphs using css..
EDIT: access database
If it is sql server 2005 USE VARCHAR(MAX)
EDIT,
It seems he saif access,
so i would go with memo
Up to 63,999 characters. (If the Memo
field is manipulated through DAO and
only text and numbers [not binary
data] will be stored in it, then the
size of the Memo field is limited by
the size of the database.)
or OLE Object (if you can)
An object (such as a Microsoft Excel
spreadsheet, a Microsoft Word
document, graphics, sounds, or other
binary data) linked (OLE/DDE link: A
connection between an OLE object and
its OLE server, or between a Dynamic
Data Exchange (DDE) source document
and a destination document.) to or
embedded (embed: To insert a copy of
an OLE object from another
application. The source of the object,
called the OLE server, can be any
application that supports object
linking and embedding. Changes to an
embedded object are not reflected in
the original object.) in a Microsoft
Access table.
Up to 1 gigabyte (limited by available
disk space)
you have several options:
store it as a long single string with no formatting, which will look bland on the screen.
store it as a long single string with embedded html and css, which will be a bad choice if you ever want to make your site have a different look/feel.
normalize it so you have tables to store books, chapters, paragraphs, etc. you could then format and style the text as you load it into the application.
The main difference between long text (CLOB / TEXT / VARCHAR(MAX)) and long data (BLOB / IMAGE / VARBINARY(MAX)) is that the former is subject to character set conversions while the former is not.
If you need to make character set conversion on the database side, use CLOB and similar.
If you always want to retrieve your data as you atored it, byte-to-byte (as opposed character-to-character), use BLOB and similar.
I don't know which database you're using, but if text doesn't need to be searchable, then you can simply store the HTML formatted text (for instance, value coming from an FCKEditor or components like this). If you need also searchability, then you can store both HTML an plain text in two separated fields.
Fields can be nvarchar(MAX) if you use MS SQL Server 2008 or any equivalent datatype on other databases.
EDIT:
Seems you're using Access, so go for Memo data type!
If you decide to store HTML, consider to store only a generic markup (div, p) to divide your text, than later apply CSS formatting, wrapping stored text within another div specifing formatting classes for children elements.
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
Use a CLOB.
For SQL Server
TEXT / NTEXT for SQL Server 2000
VARCHAR(MAX) / NVARCHAR(MAX) for SQL Server 2005 onwards
I would propose storing the first chapter as pdf file. This is secure and allows for good formatting. Then use a blob, clob, varchar, or text field depending on your product (see the other answers).
Or you could use images and look into something like amazone's "look inside". It would work with the same db techniques.
Alternatively you could use something like markup.
I personally do not like to put html in my database. Even if it is only for output. Too easy to put in some javascript. But maybe I'm just too cautious.
The following applies to Jet 4.0 only, being the version of the Access Database Engine in the era Access2000 to Access2003 inclusive:
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
If you are using ACE, being the version of the Access Database Engine in the Access2007 era, the Attachment data type would be an option, however I don't really know how it works, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I'm also wary of new data types in the first release of a major version of the Access Database Engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly. The Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.
Put it in a TEXT field, and put it with their <p> so you'll be able to style paragraphs.
As it doesn't need to be searchable, it won't impact your sql performance.

Resources