which file format is smallest when we have the same data? - file

It will be really helpful if any one could suggest the smallest file size data format when we have the same data like Tab Separated File(TSF) or Comma Seperated File (CSV) or plain text file where separated by any specific delimters or any other.
Hope we can zip the files using GZip or 7zip once the we understood the smallest file format.

I have tried JSON, BSON, YAML, Protocol buffer, Avro, XML formats
Yaml is readable like JSON format , but it is consuming huge memory.
XML as obvious that it's also consuming huge memory
Proto Buffer and Avro is better than than CSV & TSV file in terms of size, but data is in non human readable format.
My suggestion is to use JSON , that meets the readability and sizing. Also APIs are available to parse the JSON easily.

Related

How to Extract .owl and save to mysql

I have a file ontobible.owl. how to extract that file and then save data to mysql (because I want display data from ontobible.owl in website). can anyone help me?
edited:
here is my ontobible.owl file (https://teamtrainit.com/ontobible.owl)
i've try open ontobible.owl with sublime text 3 and contains like this
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#HOS5_2">
<verseID>HOS5_2</verseID>
<verse_text>And the revolters are profound to make slaughter, though I have been a rebuker of them all.</verse_text>
</Verse>
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#2CH2_1">
<hasPerson rdf:resource="http://semanticbible.org/ns/2006/NTNames#god_1324"/>
<hasPerson rdf:resource="http://www.co-ode.org/roberts/family-tree.owl#solomon_2762"/>
<verseID>2CH2_1</verseID>
<verse_text>And Solomon determined to build an house for the name of the LORD, and an house for his kingdom.</verse_text>
</Verse>
how to convert that xml tag to array or json so I cant save it to mysql database
you have several options for extracting data from owl
use owl-api and write java code (i think owl api is accessible in other languages) to extract data and pack it in the format you need. also you can use sparql queries for extracting data via jena api
install protege, open your file in protege and save it in format json-dl. this format is very similar to the regular json and you can easily transform it for your needs
install fuseki server, add your file and using sparql queries extract data from there
i think that the second option is the easiest for start if you don't want to write queries or code and it won't take long

Chunk comma delimeted flatfile of size 20+MB in Azure logic app and convert to Json - ActionResultsSizeLimitExceeded

I have a flat file in blob with below structure with header and body content. And this file can go up to the size of 20+MB. I need to Split this file for every 4000 records and convert into Json format.
"000","IN",04963,"xyz_abc",20210602,034425,278233
"803","IN","123456",0,"00002",0,1.519,"INR",1,
"803","IN","123456",0,"00004",0,1.579,"INR",1,
"803","IN","232323",0,"00002",0,1.519,"EUR",1,
"803","IN","232323",0,"00004",0,1.579,"EUR",1,
I am trying with the below approach
Step1 - read the blob content and convert to XML using XML Schema (using integration account schema & flat file decoding)
Step2 - Chunk XML for each 4000 record and convert to desired Json format and save it to processed Blob
But I am getting below issue in the Step1 while flat file decoding even though the file size is 20MB, but the restriction is 200MB.
ActionResultsSizeLimitExceeded. The action 'Flat_File_Decoding' has results size of more that '228151576' bytes. This exceeded the maximum size '209715200' allowed.
Any help will be appreciated
Please note that the error is about the size of the results, not about the size of the source file.
20 MB CSV file can be easily converted to 200+ MB XML file, depending on the size of the tags used in the XML file.
E.g. while the size of the first line in your example is only 50 characters, the size of the following linearized XML that contains the same data is 455 characters:
<?xml version="1.0" encoding="utf-8"?><ReagllyLongTag00 xmlns="http://ReallyLongNamespaceWellNotReallyLong"><ReallyLongRecord xmlns=""><ReallyLongTag01>000</ReallyLongTag01><ReallyLongTag02>IN</ReallyLongTag02><ReallyLongTag03>04963</ReallyLongTag03><ReallyLongTag04>xyz_abc</ReallyLongTag04><ReallyLongTag05>20210602</ReallyLongTag05><ReallyLongTag06>034425</ReallyLongTag06><ReallyLongTag07>278233</ReallyLongTag07></ReallyLongRecord></ReagllyLongTag00>
Azure Functions tend to manipulate data from large files much better than Logic Apps.

Perform a find and replace on thousands of Word files stored in a varbinary column of SQL Server DB

I've got a SQL Server database which has a table which contains a varbinary column.
This table has tens of thousands of rows.
This varbinary column contains documents: 85% in MS Word .doc format, 10% in .docx format and the rest in .pdf and .rtf.
There is a particular string which appears in all of these documents (an email address). I'd like to replace this string in all of these documents with a new string (an updated email address). (To be clear: The string to find and the string to replace it with is the same in all cases).
Ideally I'd like to be able to do this for all the file types but if it is only possible for .doc and .docx that would at least be the bulk of the problem solved.
I'd also like not to have to install MS Word if possible but appreciate this may be necessary.
Thanks!
You can Convert the Value to VarBinary and then replace the value. use below link to replace the varbinary value:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=76304
You can be Managing FILESTREAM Data by Using Win32 API to get BLOB into a variable.
This way you get contents of your BLOB in a variable and as if it was opened in Notepad. Use Replace to update .DOC, .DOCX and .RTF files. I do not know how to update PDFs.
This link contains C# code that loads BLOB into a variable in you C# code. Then you can save it with path, file name and extention derrived from DB as well. Here is a a small quote of code:
//Read the data from the FILESTREAM
//BLOB.
sqlFileStream.Seek(0L, SeekOrigin.Begin);
numBytes = sqlFileStream.Read(buffer, 0, buffer.Length);
string readData = unicode.GetString(buffer);
if (numBytes != 0)
Console.WriteLine(readData);
//Here you have contents of your BLOB as if opened in Notepad. Use Replace to update .doc, .docx and .rtf files.
//Write the string, "EKG data." to the FILESTREAM BLOB.
//In your application this string would be replaced with
//the binary data that you want to write.
string someData = "EKG data.";
Encoding unicode = Encoding.GetEncoding(0);
sqlFileStream.Write(unicode.GetBytes(someData.ToCharArray()),
0,
someData.Length);
See also Using FILESTREAM Storage in Client Applications.

SQL Server Bulk Import With Format File of UTF-8 Data

I have been referring to the following page:
http://msdn.microsoft.com/en-us/library/ms178129.aspx
I simply want to bulk import some data from a file that has Unicode characters. I have tried encoding the actual data file in UC-2, UTF-8, etc but nothing works. I have also modified the format file to use SQLNCHAR, but still it doesn't work and gives error:
Bulk load data conversion error (truncation) for row 1, column 1
I think it has to do with this statement from the above link:
For a format file to work with a Unicode character data file, all the
input fields must be Unicode text strings (that is, either fixed-size
or character-terminated Unicode strings).
What exactly does this mean? I thought this means every character string needs to be a fixed 2 bytes, which encoding the file in UCS-2 should handle???
This blog post was really helpful and solved my problem:
http://blogs.msdn.com/b/joaol/archive/2008/11/27/bulk-insert-using-unicode-data-files.aspx
Something else to note - a Java class was generating the data file. In order for the above solution to work, the data file needed to be encoded in UTF-16LE, which can be set in the constructor of OutputStreamWriter (for example).
In SQL Server 2012 I imported a .csv file saved with Notepad++ enconded in UCS-2 with special spanish characters

Reading and writing to xls and doc files in c

I have this particular problem where i have to write a c program that reads numerical data from a text file. The data is tab delimited. Here is a sample from the text file.
1 23099 345565 345569
2 908 66766 66768
This is data for clients and each client has a row.Each column represents customer no.,previous balance,previous reading, current reading.Then i have to generate a doc. document
that summarizes all this information and calculates the balance I can write a function that does this but how do i create an xls document
and a word document where all the results are summarized using the program? The text document has only numerical data. Any ideas
The easiest way is to create a csv file and not a xls file.
Office can open those csv files with good results.
And it is way easier to create a ascii text file with commaseparated values,
than to create something into a closed format like the ms office formats.
The simplest way to create a spreadsheet that contains formulas and formatting, and can be opened by Excel, is to create an XML Spreadsheet file.

Resources