• No results found

Lots of data

N/A
N/A
Protected

Academic year: 2023

Share "Lots of data"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Google Bigtable

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Google, Inc.

OSDI 2006

Adapted by S. Sudarshan from a talk by Erik Paulson, UW Madison

(2)

Google Scale

Lots of data

Copies of the web, satellite data, user data, email and USENET, Subversion backing store

Many incoming requests

No commercial system big enough

Couldn’t afford it if there was one

Might not have made appropriate design choices

Firm believers in the End-to-End argument

450,000 machines (NYTimes estimate, June 14

th

2006

(3)

Building Blocks

Scheduler (Google WorkQueue)

Google Filesystem

Chubby Lock service

Two other pieces helpful but not required

Sawzall

MapReduce (despite what the Internet says)

BigTable: build a more application-friendly

(4)

Google File System

Large-scale distributed “filesystem”

Master: responsible for metadata

Chunk servers: responsible for reading and writing large chunks of data

Chunks replicated on 3 machines, master responsible for ensuring replicas exist

OSDI ’04 Paper

(5)

Chubby

{lock/file/name} service

Coarse-grained locks, can store small amount of data in a lock

5 replicas, need a majority vote to be active

Also an OSDI ’06 Paper

(6)

Data model: a big map

<Row, Column, Timestamp> triple for key - lookup, insert, and delete API

Arbitrary “columns” on a row-by-row basis

Column family:qualifier. Family is heavyweight, qualifier lightweight

Column-oriented physical store- rows are sparse!

Does not support a relational model

No table-wide integrity constraints

No multirow transactions

(7)

SSTable

Immutable, sorted file of key-value pairs

Chunks of data plus an index

Index is of block ranges, not values

Index 64K

block

64K block

64K block

SSTable

(8)

Tablet

Contains some range of rows of the table

Built out of multiple SSTables

Index 64K

block

64K block

64K block

SSTable

Index 64K

block

64K block

64K block

SSTable Tablet Start:aardvark End:apple

(9)

Table

Multiple tablets make up the table

SSTables can be shared

Tablets do not overlap, SSTables can overlap

SSTable SSTable SSTable SSTable Tablet

aardvark apple

Tablet

apple_two_E boat

(10)

Finding a tablet

Stores: Key: table id + end row, Data: location

Cached at clients, which may detect data to be incorrect

in which case, lookup on hierarchy performed

Also prefetched (for range queries)

(11)

Servers

Tablet servers manage tablets, multiple tablets per server. Each tablet is 100-200 MB

Each tablet lives at only one server

Tablet server splits tablets that get too big

Master responsible for load balancing and fault

tolerance

(12)

Master’s Tasks

Use Chubby to monitor health of tablet servers, restart failed servers

Tablet server registers itself by getting a lock in a specific directory chubby

Chubby gives “lease” on lock, must be renewed periodically

Server loses lock if it gets disconnected

Master monitors this directory to find which servers exist/are alive

If server not contactable/has lost lock, master grabs lock and reassigns tablets

GFS replicates data. Prefer to start tablet server on same machine that the data is already at

(13)

Master’s Tasks (Cont)

When (new) master starts

grabs master lock on chubby

Ensures only one master at a time

Finds live servers (scan chubby directory)

Communicates with servers to find assigned tablets

Scans metadata table to find all tablets

Keeps track of unassigned tablets, assigns them Metadata root from chubby, other metadata tablets

(14)

Metadata Management

Master handles

table creation, and merging of tablet

Tablet servers directly update metadata on tablet split, then notify master

lost notification may be detected lazily by

master

(15)

Editing a table

Mutations are logged, then applied to an in-memory memtable

May contain “deletion” entries to handle updates

Group commit on log: collect multiple updates before log flush

Tablet

apple_two_E boat Insert

Insert Insert Delete

Memtable

let log

Memory

(16)

Compactions

Minor compaction – convert the memtable into an SSTable

Reduce memory usage

Reduce log traffic on restart

Merging compaction

Reduce number of SSTables

Good place to apply policy “keep only N versions”

Major compaction

Merging compaction that results in only one SSTable

No deletion records, only live data

(17)

Locality Groups

Group column families together into an SSTable

Avoid mingling data, e.g. page contents and page metadata

Can keep some groups all in memory

Can compress locality groups

Bloom Filters on SSTables in a locality group

bitmap on keyvalue hash, used to overestimate which records exist

avoid searching SSTable if bit not set

Tablet movement

Major compaction (with concurrent updates)

Minor compaction (to catch up with updates) without any concurrent updates

(18)

Log Handling

Commit log is per server, not per tablet (why?)

complicates tablet movement

when server fails, tablets divided among multiple servers

can cause heavy scan load by each such server

optimization to avoid multiple separate scans: sort log by (table, rowname, LSN), so logs for a tablet are clustered, then distribute

GFS delay spikes can mess up log write (time critical)

solution: two separate logs, one active at a time

(19)

Immutability

SSTables are immutable

simplifies caching, sharing across GFS etc

no need for concurrency control

SSTables of a tablet recorded in METADATA table

Garbage collection of SSTables done by master

On tablet split, split tables can start off quickly on shared SSTables, splitting them lazily

Only memtable has reads and updates

concurrent

(20)

Microbenchmarks

(21)
(22)

Application at Google

(23)

Lessons learned

Interesting point- only implement some of the requirements, since the last is

probably not needed

Many types of failure possible

Big systems need proper systems-level monitoring

Value simple designs

References

Related documents

 From this study it was concluded that “Gastro retentive Drug Delivery Systems as Floating Matrix Tablets” designed as three different matrix tablet such as Hydrophilic Floating

The fast dissolving tablet will be prepared by direct compression method are evaluated for various quality control tests for tablets such as hardness,

hydrochloride). In such cases diluents provide the required bulk of the tablet when the drug dosage itself is inadequate to produce tablets of adequate weight and size. Usually

These are compressed tablets made by more than one compression cycle 1. Recently a compression coated tablet has received increasing attention to deliver a drug in a pulsatile

Hence, the basic approaches to develop dispersible tablets include maximizing the porous structure of the tablet matrix, incorporating the appropriate disintegrating agent

Metformin hydrochloride sustained release tablets and Glipizide immediate release tablets were prepared using direct compression and solid dispersion

Bilayer tablets were prepared by wet granulation technique using release retarding agents like HPMC K100, Eudragit S 100 for sustained release (SR) layer and

In summary, not only is zolmitriptan ODT a convenient tablets, such as the sumatriptan oral tablet, but patients generally consider it to be a more attractive option for