How is the individual SAN elements controlled? - san

In a SAN environment, we would have multiple storage devices (say each of them with 1TB), so cumulatively the formed SAN network would give a 100's of GBs of storage capacity.
Which is the software that is responsible to splice this storage capacity to each VMs (say 500GB for each VMs)? Where does it reside?
I am finding it hard to picture this concept.

Depending on various technologies there are multiple ways to do this. For example, in block-storage environments, LUNs from different storage systems can be concatenated/striped/mirrored/RAIDed by a volume manager software on the target server. The same effect can be achieved by hardware virtualisation on storage systems: for example, one of the storage device can work as "roof" for all the rest of devices (also, look at the thin-provisioning topic). In NAS world, it's possible to use build big trees of filesystems using different mount-points for different storage systems.

Related

Can we use server SAS 2.5" hard drives for SAN?

I have 2.5" SAS hard drives in a lot of servers. Now we are planning to buy a SAN and want to utilize the existing hard drives. Can we use those hard drives in SAN storage? Someone told me that server storage and SAN storage is different.
There are different types of SAN storage systems. If you decide buying an enterprise-class storage system, that vendor storage will force you to buy their drives as well.
If you buy a custom build system, you should check their specifications. Keep in mind, drives are very important part of the storage system and in highly loaded environments drives can die very often in this case reusing of old drives can add more issues to you then benefits.

Distributed file system vs mounting a drive over the network.

I would like to ask you, what are the pros/cons of having a distributed file system off the shelf (like HadoopFS) over just mounting a drive over the network on linux? As I understand we will achieve the same with these two approaches: the same data will be available on many remote locations.
Cheers!
Distributed filesystems provide many benefits like automatic backup or distributions on some rules (you can, say, add many new elements to your storage and that operation will be transparent for your applications using the storage).
Mounting drives can become a pain one day, when one of your elements in the network gets off on some reason, while your applications rely on it.

Distributed File System For High Concurrent Access Of Small Files

what are the DFS technologies out there for high concurrent access (say by 10000 remote threads on a local 1 gbs network) of 1,000,000 files which are only in MB size range but the DSF should provide high concurrent stream of them to users?
Common HPC filesystems such as Lustre or GPFS often do not provide good support for the scenario you describe but are instead optimized towards high bandwidth on large file accesses. In the HPC context you should consider using IO middleware such as MPI-IO or high level IO libraries such as HDF5 rather than interfacing with the file system directly. Those libraries can hide the complexity of optimizing accesses to specific file systems from your application, which one is suitable depends on the structure of your application scenario.
On the other hand, for highly concurrent and unstructured small accesses, you might want to look into Cloud related technologies, e.g. Google Filesystem, distributed key value storages, Cassandra, just to give a few pointers for further research.
The general "file" abstraction and access approach (POSIX interface) was not designed for highly concurrent access which makes it difficult to conform with the interface and provide high concurrency at the same time.
If you want more specific hints for suitable technology, please provide some more specific information about your use-case(s).

Multi-tier vs Distibuted?

Multi-tier and/or ditstributed apps, do they have the same meaning ?
When we talk about layers in these apps, is it physical layers (database, browser, web server,...) or logical layers (data access layer, business layer,...) ?
Maybe these two sentences do convey intuitively the distinction between distributed and multi-tier:
Distributed: You replicate the processing amongst nodes
Multi-tier: You split the processing amongst tiers
In one case, the same processing is replicated over several nodes. In the other case, each tier has a distinct responsibility and the processing running on each tier differ.
Both notions are not exclusive: you can have non-distributed multi-tier apps (if there is no form of redundancy/replication), distributed apps which are not multi-tier, but also multi-tier apps which are distributed (if they have some form of redundancy).
There would be a lot more to say about the distinction, but the difference (to me) is essentially there.
ewernli has the mostly correct answer here. The only missing piece from the original question concerns physical and logical layers.
From a distributed and/or multi-tier perspective whether the layers are physically separate or just logically so is immaterial. Meaning, it doesn't matter. You can create multi-tier and even distributed applications which resides entirely on the same machine instance.
That said, it is more common to separate the tiers into different machines built specifically for that type of load. For example, a web server and a database server. Or even a web server, several web services machines, and one or more database servers.
All of these features, distributed, multi-tier, and/or load balanced with logical and/or physical layers are just features of the application design.
Further, in today's world of virtual machines, it's entirely possible (and even likely) to set up a multi-tier, distributed, and load balanced application within the confines of a single real machine. Although, I'd never recommend that course of action because the point of load balancing and distributed services is usually to increase availability or throughput.
Multi-tier means that your application will be based on multiple machines with different tasks (Database, Web application, ...). Distributed means that your application will run in multiple machines a the same time, for example your website could be hosted on 3 different servers.
For multi-tier applications we speak generally about physical layer. But in every application you can/should have different logical layers.

What database does Google use?

Is it Oracle or MySQL or something they have built themselves?
Bigtable
A Distributed Storage System for Structured Data
Bigtable is a distributed storage
system (built by Google) for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity servers.
Many projects at Google store data in
Bigtable, including web indexing,
Google Earth, and Google Finance.
These applications place very
different demands on Bigtable, both in
terms of data size (from URLs to web
pages to satellite imagery) and
latency requirements (from backend
bulk processing to real-time data
serving).
Despite these varied
demands, Bigtable has successfully
provided a flexible, high-performance
solution for all of these Google
products.
Some features
fast and extremely large-scale DBMS
a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
designed to scale into the petabyte range
it works across hundreds or thousands of machines
it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
each table has multiple dimensions (one of which is a field for time, allowing versioning)
tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.
Architecture
BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."
In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.
Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.
The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.
Implementation
BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.
Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.
There are three primary server types of interest in the Bigtable system:
Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.
Example from Google's research paper:
A slice of an example table that
stores Web pages. The row name is a
reversed URL. The contents column
family contains the page contents, and
the anchor column family contains the
text of any anchors that reference the
page. CNN's home page is referenced by
both the Sports Illustrated and the
MY-look home pages, so the row
contains columns named
anchor:cnnsi.com and
anchor:my.look.ca. Each anchor cell
has one version; the contents column
has three versions, at timestamps
t3, t5, and t6.
API
Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.
Here is the link to the PDF of the research paper.
And here you can find a video showing Google's Jeff Dean in a lecture at the University of Washington, discussing the Bigtable content storage system used in Google's backend.
It's something they've built themselves - it's called Bigtable.
http://en.wikipedia.org/wiki/BigTable
There is a paper by Google on the database:
http://research.google.com/archive/bigtable.html
Spanner is Google's globally distributed relational database management system (RDBMS), the successor to BigTable. Google claims it is not a pure relational system because each table must have a primary key.
Here is the link of the paper.
Spanner is Google's scalable, multi-version, globally-distributed, and
synchronously-replicated database. It is the first system to
distribute data at global scale and support externally-consistent
distributed transactions. This paper describes how Spanner is
structured, its feature set, the rationale underlying various design
decisions, and a novel time API that exposes clock uncertainty. This
API and its implementation are critical to supporting external
consistency and a variety of powerful features: non-blocking reads in
the past, lock-free read-only transactions, and atomic schema changes,
across all of Spanner.
Another database invented by Google is Megastore. Here is the abstract:
Megastore is a storage system developed to meet the requirements of
today's interactive online services. Megastore blends the scalability
of a NoSQL datastore with the convenience of a traditional RDBMS in a
novel way, and provides both strong consistency guarantees and high
availability. We provide fully serializable ACID semantics within
fine-grained partitions of data. This partitioning allows us to
synchronously replicate each write across a wide area network with
reasonable latency and support seamless failover between datacenters.
This paper describes Megastore's semantics and replication algorithm.
It also describes our experience supporting a wide range of Google
production services built with Megastore.
As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.
The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.
Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.
And it's maybe also handy to know that BigTable is not a relational database (like MySQL) but a huge (distributed) hash table which has very different characteristics. You can play around with (a limited version) of BigTable yourself on the Google AppEngine platform.
Next to Hadoop mentioned above there are many other implementations that try to solve the same problems as BigTable (scalability, availability). I saw a nice blog post yesterday listing most of them here.
Google primarily uses Bigtable.
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size.
For more information, download the document from here.
Google also uses Oracle and MySQL databases for some of their applications.
Any more information you can add is highly appreciated.
Google services have a polyglot persistence architecture. BigTable is leveraged by most of its services like YouTube, Google Search, Google Analytics etc. The search service initially used MapReduce for its indexing infrastructure but later transitioned to BigTable during the Caffeine release.
Google Cloud datastore has over 100 applications in production at Google both facing internal and external users. Applications like Gmail, Picasa, Google Calendar, Android Market & AppEngine use Cloud Datastore & Megastore.
Google Trends use MillWheel for stream processing. Google Ads initially used MySQL later migrated to F1 DB - a custom written distributed relational database. Youtube uses MySQL with Vitess. Google stores exabytes of data across the commodity servers with the help of the Google File System.
Source: Google Databases: How Do Google Services Store Petabyte-Exabyte Scale Data?
YouTube Database – How Does It Store So Many Videos Without Running Out Of Storage Space?

Resources