How does SQLITE DB saves data of multiple tables in a single file? - database

I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.

The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema

Related

Need a clever way to get orders from all stores while each store is in a different database

The setup
I have the following database setup:
CentralDB
Table: Stores
Table: Users
Store1DB
Table: Orders
Store2DB
Table: Orders
Store3DB
Table: Orders
Store4DB
Table: Orders
... etc
CentralDB contains the users, logging and a Stores table with the name of each store database and general information about each store such as address, name, description, image, etc...
All the StoreDB's use the same structure just different data.
It is important to know that the list of stores will shrink and increase in the future.
The main client communicating with this setup is an API REST Service which gets passed a STOREID in the Header of each request telling it which database to connect to. This works flawlessly so far.
The reasoning
Whenever we need to do database maintenance on one store, we don't want all other stores to be down.
Backup management should be per store
Not having to write the WHERE storeID=x every time and for every table
Performance: each store could run on its own database server if the need arises
The goal
I need my REST API Service to somehow get all orders from all stores in one query.
Will you help me figure out a way to do this without hardcoding all storedb names? I was thinking about a stored procedure on the CentralDB but I was hoping there would be other solutions. In any case it has to be very efficient.
One option would be to have a list of databases stored in a "system" table in CentralDB.
Then you could create a stored procedure that would read the database names from the table, loop through them with cursor and generate a dynamic SQL that would UNION the results from all the databases. This way you would get a single recordset of results.
However, this database design is IMHO flawed. There is no reason for using multiple databases to store data that belongs to the same "domain". All the reasons that you have mentioned can be solved by using a single database with proper database design. Having multiple databases will create multiple problems on the long term:
you will need to change structure of all the DBs when you modify your database model
you will need to create/drop new databases when new stores are added/removed from your system
you will need to have items and other entities that are "common" to all the stores duplicated in all the DBs
what about reporting requirements (e.g. get sales data for stores 1 and 2 together, etc.) - this will require creating complex union queries...
etc...
On the long term, managing and maintaining this model will be a big pain.
I'd maintain a set of views that UNION ALL all the data. Every time a store is added or deleted those views must be updated. This can be automated.
The views provide an illusion to the application that there is only one database.
What I would not do is have each SQL query or procedure query all the database names and create dynamic SQL. That would entail lots of code duplication and an unnecessary loss of performance. This approach is error prone. Better generate code once in a central place and have all other SQL code reference that generated code.

How to attach and view pdf documents to access database

I have a very simple database in access, but for each record i need to attach a scanned in document (probably pdf). What is the best way to do this, the database should not just link to a file on the pc, but should copy and keep the file with it, meaning if the original file goes missing the database is moved or copied, the file should still be accessable from within the Database. Is This possible? and what is the easiest way of doing it? If is should i can write a macro, i just dont know where to start. and also when i display a report of the table, i would like to just see thumbnails of the documents.
Thank you.
As the other answerers have noted, storing file data inside a database table can be a questionable practice. That said, I wouldn't personally rule it out, though if you are going to take that option, I'd strongly suggest splitting out the file data into its own table in its own backend file. For example:
Create a new database file called Scanned files.mdb (or Scanned files.accdb).
Add a single table called Scans with fields such as FileID (AutoNumber, primary key), MainTableID (matches whatever is the primary key of the main table in the main database file), FileName (Text), FileExt (Text) and FileData ('OLE object', really just a BLOB - don't actually use OLE Objects because they will bloat the database horribly).
Back in the frontend, add a reference to Scans as a linked table.
Use a bit of VBA to upload and extract files from the Scans table (if you're interested in the mechanics of this, post a separate question).
Use the VBA Shell routine (if you must) or ShellExecute from the Windows API (= the better option IMO) to open extracted data.
If you are using the newer ACCDB format, then you have the 'attachment' field type available as smk081 suggests. This basically does most of the above steps for you, however doing things 'by hand' gives you greater flexibilty - for example, it allows giving each file a 'DateScanned' or 'DateEffective' field.
That said, your requirement for thumbnails will require explicit coding whatever option you take. It might be possible to leverage the Windows file previewing API, though I'd be certain thumbnails are a definite requirement before investigating this - Access VBA is powerful enough to encourage attempts at complex solutions, but frequently not clean and modern enough to allow fulfilling them in a particularly maintainable fashion.
There is an Attachment type under Data Type when you go into Design View of your table. You can add an attachment field here. When you go into the Datasheet view of the table you can select this field for a particular row and a window will open for you to specify the attachment. This will cause your database to quickly grow in size if you add a lot of large attachments.
You can use an OLE field in a table, but I would really suggest you not use this approach. The database is going to be HUGE in no time, and you're going to regret it.
Instead, you should consider adding a field that stores the path to the file, and keep the files in one folder on your network. Then you can use a SHELL() command to open the file. What's the difference between restoring an Access database and restoring PDF files if something goes wrong? This will keep your database at a manageable size and reduce the possibility of corruption.

How are ABAP data clusters stored in the database?

It's possible to store data clusters inside the database using import and export statements, together with a dictionary table which adheres to a template (at least has the fields MANDT, RELID, SRTFD, SRTF2, CLUSTR, CLUSTD).
Here's two example statements which store/retrieve an entire internal table ta_test as a data cluster with name testtab and id TEST in the database, using dictionary table ztest, and area AA
export testtab = ta_test to database ztest(AA) id 'TEST'.
import testtab = ta_test from database ztest(AA) id 'TEST'.
Looking at the contents of the ztest table, I see the following records (first 4 fields are the primary key):
MANDT 200
RELID AA
SRTFD TEST
SRTF2 0 (auto-incremented for each record)
CLUSTR integer value with a maximum of 2.886
CLUSTD a 128 character hexadecimal string
I've also noticed that the amount of data stored this way is a lot less than the data which was inside the internal table (for instance, 1.000.000 unique records in the internal table result in only 1.703 records inside the ztest table). Setting compression off on the export statement does increase the amount of records, but it's still a lot less.
My questions: does anyone know how this works exactly? Is the actual data stored elsewhere and does ztest contain pointers to it? Compression? Encryption? Is the actual data accessible directly from the database (skipping the ABAP layer)?
The internal format of data clusters is not documented (at least not in an official documentation). From my experience, it does contain the entire data and not just pointers: Transporting the table entries to a different system -- as you do frequently when transporting ALV list layouts -- is sufficient to move the contents over. Moreover, the binary blob does not seem to contain much information about the data structure - if you change the source/target structure in an incompatible way, you risk losing data. Direct access from the database layer is not possible (this is actually stated in numerous places all over the documentation). It might be possible to reverse-engineer the marshalling/unmarshalling algorithm, but why bother when you've got the language statement to access the contents at hand?

How to store data's change history?

I need to store data's change histories in database. For example some time some user modify some property of some data. The expected result is we can get the change histories for one data like
Tom changed title to 'Title one;'
James changed name to 'New name'
Steve added new_tag 'tag23'
Based on these change histories we can get all versions for some data.
Any good idea to achieve this? Not limited to traditional relation database.
These are commonly called audit tables. How I generally manage this is using triggers on the database. For every insert/update from a source table the trigger copies the data into another table called the same table name with an _AUDIT appended to it (the naming convention does not matter, it's just what I use). ORACLE provides you with something called journal tables. Using ORACLE designer (or manually) you can achieve the same thing and often developers put a _JN to the end of the journal/audit table. This, however, works the same, with triggers on the source table copying data into the audit table.
EDIT:
I should also note that you can create a new separate schema to manage just your audit tables or you can keep them in your schema with the source tables. I do both, it just depends on the situation.
I wrote an article about various options: http://blog.schauderhaft.de/2009/11/29/versioned-data/
If you are not tied to a relational database, there are things called 'append only' databases (I think), which never change data, but only append new versions. For your case this sounds kind of perfect. Unfortunately I don't know of any implementation.

What purpose does a .DDF file have?

Hey can someone tell me what the Field, File and Index .ddf files do in pervasive. Do they have to changed or be updated when a table definition changes? Any insight would be GREATLY appreciated.
Cheers.
FILE.DDF links the underlying Btrieve Data files to a logical table name.
FIELD.DDF uses the File Id from FILE.DDF to define all of the fields including offsets, data types, etc for each table.
INDEX.DDF defines the indexes on the fields in FIELD.DDF.
They are the field information meta date used by PSQL to access the data files in a relation access method (ODBC, OLEDB, ADO.NET, etc).
They do have to be changed if the underlying data file is changed through Btrieve. If the table definition changes through SQL (like ALTER TABLE statements), the Pervasive Control Center, DTI (Distributed Tuning Interface), DTO (Distributed Tuning Object), PDAC, ActiveX, or DDF Builder then the DDFs are updated automatically.

Resources