Temporary storage for excel file processing - sql-server

I am developing a web application in Java EE technologies(Spring, Spring MVC, Hibernate). In this application I am parsing an Excel file and need to add these data to a sql server database.
Before adding these data to the database I need to get the user input for each row in the Excel file whether they really want to add these data to the database.
I can do this something like this:
First save the data to a table(table_tmp)
Then display the data to the user and from their input add it to the actual table and remove from the temporary table.
But I think there will be a better solution (some kind of temporary storage that I can delete after getting user input).
Can I use some NoSQL solution for this?

Why do you need to store it in the first place? You want to allow them to upload that excel, and come back way after their session expired and select rows they want? Do you really have to make it persistent?
If yes, then do you have any problems with your current setup? If not then you'll be introducing another external component that needs to be administered and that you have to interface with (what happens if mongodb is down, not enough disk space, connection timed out, ...) just to keep a temporary file.
But if you still want to do it, then you might first consider something really simple (and fast), like memcached - a key-value in-memory storage.

Related

can we use JSON as a database?

I'm looking for fast and efficient data storage to build my PHP based web site. I'm aware of MySql. Can I use a JSON file in my server root directory instead of a MySQL database? If yes, what is the best way to do it?
You can use any single file, including a JSON file, like this:
Lock it somehow (google PHP file locking, it's possibly as simple as adding a parameter to file open function or changing function name to locking version).
Read the data from file and parse it to internal data stucture.
Optionally modify the data in internal data structure.
If you modified the data, truncate the file to 0 length and write new data to it.
Unlock the file as soon as you can, other requests may be waiting...
You can keep using the data in internal structures to render the page, just remember it may be out-dated as soon as you release the file lock and other HTTP request can modify it.
Also, if you modify the data from user's web form, remember that it may have been modified in between. Like, load page with user details for editing, then other user deletes that user, then editer tries to save the changed details, and should probably get error instead of re-creating deleted user.
Note: This is very inefficient. If you are building a site where you expect more than say 10 simultaneous users, you have to use a more sophisticated scheme, or just use existing database... Also, you can't have too much data, because parsing JSON and generating modified JSON takes time.
As long as you have just one user at a time, it'll just get slower and slower as amount of data grows, but as user count increases, and more users means both more requests and more data, things start to get exponentially slower and you very soon hit limit where HTTP requests start to expire before file is available for handling the request...
At that point, do not try to hack it to make it faster, but instead pick some existing database framework (SQL or nosql or file-based). If you start hacking together your own, you just end up re-inventing the wheel, usually poorly :-). Well, unless it is just programming exercise, but even then it might be better to instead learn use of some existing framework.
I wrote an Object Document Mapper to use with json files called JSON ODM may be a bit late, but if it is still needed it is open source under MIT Licence.
It provides a query languge, and some GeoJSON tools
The new version of IBM Informix 12.10 xC2 supports now JSON.
check the link : http://pic.dhe.ibm.com/infocenter/informix/v121/topic/com.ibm.json.doc/ids_json_007.htm
The manual says it is compatible with MongoDB drivers.
About the Informix JSON compatibility
Applications that use the JSON-oriented query language, created by
MongoDB, can interact with data stored in Informix® databases. The
Informix database server also provides built-in JSON and BSON (binary
JSON) data types.
You can use MongoDB community drivers to insert, update, and query
JSON documents in Informix.
Not sure, but I believe you can use the Innovator-C edition (free for production) to test and use it with no-cost either for production enviroment.
One obvious case when you can prefer JSON (or another file format) over database is when all yours (relatively small) data stored in the application cache.
When an application server (re)starts, an application reads data from file(s) and stores it in the data structure.
When data changes, an application updates file(s).
Advantage: no database.
Disadvantage: for a number of reasons can be used only for systems with relatively small data. For example, a very specific product site with several hundreds of products.

Storing data in text files instead of SQL Server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx
I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.
The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

Store user's images in a web application

I have a web application currently being developed in JSP/Servlets.
The application allows users to upload image files for each user account.
What is the best way to store the images?
Please consider the following features that I'd like to have before your answer;
I am using MySQL now, but there could be a possibility of moving to a different database.
I can store the images as flat files, but I want the admin account to have an option to backup the complete database and reload it later. Ideally the backup/reload should work with the images too. (even if the backup was done from a different physical machine and reload was done from a different machine)
Using BLOB/CLOB is an option that solves problem 2.
But, what if my database becomes very large?
In your case, I strongly recommend you having a blob field on your database and store the images in it. Mainly, because this is the correct place for them. So, make a Servlet that retrieves the image of the specified user from the database. For example, /userimage?name=john.
About your "size/performance" problem:
Databases were made (among other things) to store and exchange large amounts of data.
So, they're the best option.
Even if you store them on other sites, they will still reduce free space and performance.
If you really want to manage LARGE data (>= 3TB, not your case) then you can store them on a file system and save the filenames in the DB. For more info, look at this question.
Store them in the file system. It's faster and simpler. Often, when accessing an image, you're going to have to save it to a file anyway before you can utilize it. You can examine the images with third party tools. You can store the recordID in the filename to keep the image/record association from ever being broken.
Many others share this opinion: http://forums.asp.net/p/1512925/3610536.aspx
Just store them in the DB... if your user base "becomes very large" you'll have buckets of cash to buy a balls-out database server (or even a farm of them) which can handle the load, now won't you?

Designing a generic unstructured data store

The project I have been given is to store and retrieve unstructured data from a third-party. This could be HR information – User, Pictures, CV, Voice mail etc or factory related stuff – Work items, parts lists, time sheets etc. Basically almost any type of data.
Some of these items may be linked so a User many have a picture for example. I don’t need to examine the content of the data as my storage solution will receive the data as XML and send it out as XML. It’s down to the recipient to convert the XML back into a picture or sound file etc. The recipient may request all Users so I need to be able to find User records and their related “child” items such as pictures etc, or the recipient may just want pictures etc.
My database is MS SQL and I have to stick with that. My question is, are there any patterns or existing solutions for handling unstructured data in this way.
I’ve done a bit of Googling and have found some sites that talk about this kind of problem but they are more interested in drilling into the data to allow searches on their content. I don’t need to know the content just what type it is (picture, User, Job Sheet etc).
To those who have given their comments:
The problem I face is the linking of objects together. A User object may be added to the data store then at a later date the users picture may be added. When the User is requested I will need to return the both the User object and it associated Picture. The user may update their picture so you can see I need to keep relationships between objects. That is what I was trying to get across in the second paragraph. The problem I have is that my solution must be very generic as I should be able to store anything and link these objects by the end users requirements. EG: User, Pictures and emails or Work items, Parts list etc. I see that Microsoft has developed ZEntity which looks like it may be useful but I don’t need to drill into the data contents so it’s probably over kill for what I need.
I have been using Microsoft Zentity since version 1, and whilst it is excellent a storing huge amounts of structured data and allowing (relatively) simple access to the data, if your data structure is likely to change then recreating the 'data model' (and the regression testing) would probably remove the benefits of using such a system.
Another point worth noting is that Zentity requires filestream storage so you would need to have the correct version of SQL Server installed (2008 I think) and filestream storage enabled.
Since you deal with XML, it's not an unstructured data. Microsoft SQL Server 2005 or later has XML column type that you can use.
Now, if you don't need to access XML nodes and you think you will never need to, go with the plain varbinary(max). For your information, storing XML content in an XML-type column let you not only to retrieve XML nodes directly through database queries, but also validate XML data against schemas, which may be useful to ensure that the content you store is valid.
Don't forget to use FILESTREAMs (SQL Server 2008 or later), if your XML data grows in size (2MB+). This is probably your case, since voice-mail or pictures can easily be larger than 2 MB, especially when they are Base64-encoded inside an XML file.
Since your data is quite freeform and changable, your best bet is to put it on a plain old file system not a relational database. By all means store some meta-information in SQL where it makes sense to search through structed data relationships but if your main data content is not structured with data relationships then you're doing yourself a disservice using an SQL database.
The filesystem is blindingly fast to lookup files and stream them, especially if this is an intranet application. All you need to do is share a folder and apply sensible file permissions and a large chunk of unnecessary development disappears. If you need to deliver this over the web, consider using WebDAV with IIS.
A reasonably clever file and directory naming convension with a small piece of software you write to help people get to the right path will hands down, always beat any SQL database for both access speed and sequential data streaming. Filesystem paths and file names will always beat any clever SQL index for data location speed. And plain old files are the ultimate unstructured, flexible data store.
Use SQL for what it's good for. Use files for what they are good for. Best tools for the job and all that...
You don't really need any pattern for this implementation. Store all your data in a BLOB entry. Read from it when required and then send it out again.
Yo would probably need to investigate other infrastructure aspects like periodically cleaning up the db to remove expired entries.
Maybe i'm not understanding the problem clearly.
So am I right if I say that all you need to store is a blob of xml with whatever binary information contained within? Why can't you have a users table and then a linked(foreign key) table with userobjects in, linked by userId?

Concurrently access database with Excel as frontend - doable?

Suppose you have an database with the largest tables containing about 200.000 rows, and frequently modified. The client wants Excel to connect via ODBC to the database, and work as a frontend to manage the data. The data should be modifiable by up to 25 users concurrently.
My first instinct would be to recommend something else, for example a web frontend. But suppose the client insists on the Excel solution, would you regard it as doable, and what pitfalls would you see in it?
My doubts would be about:
data integrity (how to manage users modifying same data at the same time)
large amounts of data moved unnecessarily (when opening the Excel workbook I imagine that the whole database has to be transferred)
security (showing only parts of data to appropriate users in a secure way would be challenging - see previous point)
using a tool (Excel) for something, in which it doesn't excel (pardon the pun)
I do this all the time. No you don't have to bring in the whole database or even the whole table. I use ADO and VBA and send SQL statements via the Command object. For example, I have a royalty database with an Excel front end.
The user types in an invoice number and a SELECT statement retrieves that one record and populates some custom classes. The user enters/modifies some data and clicks 'Save'. Then the class has a method that writes the record back to the database with and UPDATE or INSERT depending on the situation.
At the end of the month, the user enters a date range and retrieves some records into a report, again just a SELECT statement filling some classes and outputting to a sheet.
Use Transactions so you can roll back if you hit any record locking problems, but with 25 users you probably won't.
At first glance I would suggest treating Excel a bit like a web page, that is, pull only the required data and use a specific form for editing that updates one record at a time via ADO. You need only lock a single record and that for the fraction of time it takes to update. You can check whether or not the record has changed since it was opened for editing and users can be told that they cannot open a record for editing and then leave it sitting around in the edit form or they may lose the changes.
It is usually quite unlikely for such a small group to need to change the same record at the same time.
I do not think you will have much trouble with 25 concurrent users.

Resources