Transaction Processing
Transaction Processing | 1/146 |
Where transaction processing fits in the DBMS:
Transactions | 2/146 |
A transaction is:
... Transactions | 3/146 |
A transaction
A transaction can end in two possible ways:
... Transactions | 4/146 |
A transaction is a DB state-change operation.
Assume that the code of the transaction
... Transactions | 5/146 |
Transactions execute on a collection of data that is
... Transactions | 6/146 |
If a transaction commits, must ensure that
Concurrency in DBMSs | 7/146 |
Concurrency in a multi-user DBMS like PostgreSQL:
ACID Properties | 8/146 |
Data integrity is assured if transactions satisfy the following:
Atomicity
... ACID Properties | 9/146 |
Atomicity can be represented by state-transitions:
COMMIT
ABORT
... ACID Properties | 10/146 |
Transaction processing:
Our discussion thus focusses on: Atomicity, Durability, Isolation
... ACID Properties | 11/146 |
Atomicity is handled by the commit and abort mechanisms
Durability is handled by implementing stable storage, via
Review of Transaction Terminology | 12/146 |
To describe transaction effects, we consider:
READ
WRITE
ABORT
COMMIT
SELECT
READ
UPDATE
DELETE
READ
WRITE
INSERT
WRITE
... Review of Transaction Terminology | 13/146 |
More on transactions and SQL
BEGIN
begin
COMMIT
END
end
ROLLBACK
ABORT
... Review of Transaction Terminology | 14/146 |
The READ
WRITE
ABORT
COMMIT
read item X in transaction T | ||
write item X in transaction T | ||
abort transaction T | ||
commit transaction T |
Schedules | 15/146 |
A schedule gives the sequence of operations that occur
For a single tx, there is a single schedule (its operations in order).
For a group of tx's, there are very many possible schedules.
... Schedules | 16/146 |
Consider a simple banking transaction, expressed in "PLpgSQL":
create function withdraw(Acct integer, Required float) returns text as $$ declare Bal float; begin select balance into Bal from Accounts where id=Acct;R Bal := Bal - Required; update Accounts set balance=Bal where id=Acct;W if (Bal < 0) then rollback; return 'Insufficient funds';A else commit; return 'New balance: '||Bal::text;C end if; end; $$ language plpgsql;
Notes:
begin
... Schedules | 17/146 |
If tx T = withdraw(
,
)
If tx T = withdraw(
,
)
withdraw(
,
)
some possible schedules are:
... Schedules | 18/146 |
Serial schedules have no interleave of operations from different tx's.
Why serial schedules are good:
... Schedules | 19/146 |
With different-ordered serial executions, tx's may get different results.
I.e. StateAfter(T1;T2) = StateAfter(T2;T1) is not generally true.
Consider the following two transactions:
T1 : select sum(salary) from Employee where dept='Sales' T2 : insert into Employee values (....,'Sales',...)
If we execute T1 then T2 we get a smaller salary total than if we execute T2 then T1.
In both cases, however, the salary total is consistent with the state of the database at the time the transaction is executed.
... Schedules | 20/146 |
A serial execution of consistent transactions is always consistent.
If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.
In the worst case, the effect of executing the transactions ...
... Schedules | 21/146 |
Not all concurrent executions cause problems.
For example, the schedules
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or ...
leave the database in a consistent state.
Example Transaction #1 | 22/146 |
Problem: Allocate a seat on a plane flight
Implement as a function returning success status:
function allocSeat(paxID integer, flightID integer, seatNum string) returning boolean { check whether seat currently occupied if (already occupied) tell pax seat taken; return !ok else assign pax to seat; return ok }
Assume schema:
Flight(flightID, flightNum, flightTime, ...) SeatingAlloc(flightID, seatNum, paxID, ...)
... Example Transaction #1 | 23/146 |
PLpgSQL implementation for seat allocation:
create or replace function allocSeat(paxID int, fltID int, seat text) returns boolean as $$ declare pid int; begin select paxID into pid from SeatingAlloc where flightID = fltID and seatNum = seat; if (pid is not null) then return false;-- someone else already has seat else update SeatingAlloc set pax = paxID where flightID = fltID and seatNum = seat; commit; return true; end if; end; $$ langauge plpgsql;
... Example Transaction #1 | 24/146 |
If customer Cust1 executes allocSeat(Cust1,f,23B)
... Example Transaction #1 | 25/146 |
Consider two customers trying allocSeat(?,f,23B) simultaneously.
A possible order of operations ...
Serial execution (e.g. enforced by locks) could solve these kinds of problem.
Example Transaction #2 | 26/146 |
Problem: transfer funds between two accounts in same bank.
Implement as a function returning success status:
function transfer(sourceAcct integer, destAcct integer, amount real) returning boolean { check whether sourceAcct is valid check whether destAcct is valid check whether sourceAcct has sufficient funds (>= amount) if (all ok) { withdraw money from sourceAcct deposit money into destAcct } }
... Example Transaction #2 | 27/146 |
PLpgSQL for funds transfer between accounts:
create or replace function transfer(source int, dest int, amount float) returns boolean as $$ declare ok boolean := true; acct Accounts%rowtype; begin select * into acct from Accounts where id=source; if (not found) then raise warning 'Invalid Withdrawal Account'; ok := false; end if; select * from Accounts where id=dest; if (not found) then raise warning 'Invalid Deposit Account'; ok := false; end if; if (acct.balance < amount) then raise warning 'Insufficient funds'; ok := false; end if; if (not ok) then rollback; return false; end if; update Accounts set balance := balance - amount where id = source; update Accounts set balance := balance + amount where id = dest; commit; return true; end; $$ language plpgsql;
... Example Transaction #2 | 28/146 |
If customer transfers $1000 from Acct1 to Acct2
Similar to earlier problem; could be fixed by serialization.
... Example Transaction #2 | 29/146 |
Consider customer transfers $1000 from Acct1 to Acct2
Transaction Anomalies | 30/146 |
What problems can occur with concurrent transactions?
The set of phenomena can be characterised broadly under:
... Transaction Anomalies | 31/146 |
Dirty read: a transaction reads data written by a concurrent uncommitted transaction
Example:
Transaction T1 Transaction T2 (1) select a into X from R where id=1 (2) select a into Y from R where id=1 (3) update R set a=X+1 where id=1 (4) commit (5) update R set a=Y+1 where id=1 (6) commit
Effect: T1's update on R.a
... Transaction Anomalies | 32/146 |
Nonrepeatable read: a transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read)
Example:
Transaction T1 Transaction T2 (1) select * from R where id=5 (2) update R set a=8 where id=5 (3) commit (4) select * from R where id=5
Effect: T1 runs same query twice; sees different data
... Transaction Anomalies | 33/146 |
Phantom read: a transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction
Example:
Transaction T1 Transaction T2 (1) select count(*) from R where a=5 (2) insert into R(id,a,b) values (2,5,8) (3) commit (4) select count(*) from R where a=5
Effect: T1 runs same query twice; sees different result set
Example of Transaction Failure | 34/146 |
The above examples generally assumed that transactions committed.
Additional problems can arise when transactions abort.
We give examples using the following two transactions:
T1: read(X) T2: read(X) X := X + N X := X + M write(X) write(X) read(Y) Y := Y - N write(Y)
and initial DB state X=100
Y=50
N=5
M=8
... Example of Transaction Failure | 35/146 |
Consider the following schedule where one transaction fails:
T1: R(X) W(X) A T2: R(X) W(X)
Transaction T1
X
The abort will rollback the changes to X
Consider three places where rollback might occur:
T1: R(X) W(X) A [1] [2] [3] T2: R(X) W(X)
Transaction Failure - Case 1 | 36/146 |
This scenario is ok. T1
Database Transaction T1 Transaction T2 --------- ------------------ -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort 100 rollback read(X) 100 X:=X+M 108 108 write(X) --------- 108 50
Transaction Failure - Case 2 | 37/146 |
In this scenario, some of T1
Database Transaction T1 Transaction T2 --------- ------------------ -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort read(X) 105 X:=X+M 113 100 rollback 113 write(X) --------- 113 50
Transaction Failure - Case 3 | 38/146 |
In this scenario, T2
Database Transaction T1 Transaction T2 --------- ------------------ -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort read(X) 105 X:=X+M 113 113 write(X) 100 rollback --------- 100 50
Transaction Isolation |
Transaction Isolation | 40/146 |
If transactions always run "single-user":
... Transaction Isolation | 41/146 |
In practice, serial execution gives poor performance.
We need approaches that allow "safe" concurrency
DBMS Transaction Management | 42/146 |
Abstract view of DBMS concurrency mechanisms:
The Scheduler
Serializability | 43/146 |
Consider two schedules S1 and S2 produced by
... Serializability | 44/146 |
Two formulations of serializability:
i.e. there are VS schedules that are not CS, but not vice versa
Isolation and Concurrency Control | 45/146 |
It is not useful to
Other Properties of Schedules | 46/146 |
Above discussion explicitly assumes that all transactions commit.
What happens if some transaction aborts?
Under serial execution, there is no problem:
Recoverability | 47/146 |
Consider the serializable schedule:
T1: R(X) W(Y) C T2: W(X) A
(where the final value of Y
X
Notes:
... Recoverability | 48/146 |
Recoverable schedules avoid these kinds of problems.
For a schedule to be recoverable, we require additional constraints
Note that recoverability does not prevent "dirty reads".
In order to make schedules recoverable in the presence of dirty reads and aborts, may need to abort multiple transactions.
Cascading Aborts | 49/146 |
Recall the earlier non-recoverable schedule:
T1: R(X) W(Y) C T2: W(X) A
To make it recoverable requires:
T1: R(X) W(Y) ... A T2: W(X) A
Known as cascading aborts (or cascading rollback).
... Cascading Aborts | 50/146 |
Example: T3 aborts, causing T2 to abort, causing T1 to abort
T1: R(Y) W(Z) A T2: R(X) W(Y) A T3: W(X) A
Even though T1 has no direct connection with T3
(i.e. no shared data).
This kind of problem ...
... Cascading Aborts | 51/146 |
Cascading aborts can be avoided if
(alternative formulation: no tx can read data items written by an uncommitted tx)
Downside: reduces opportunity for concurrency.
GUW call these ACR (avoid cascading rollback) schedules.
All ACR schedules are also recoverable.
Strictness | 52/146 |
Strict schedules also eliminate the chance of writing dirty data.
A schedule is strict if
... Strictness | 53/146 |
Example: non-strict schedule
T1: W(X) A T2: W(X) A
Problems with handling rollback after aborts:
Schedule Properties | 54/146 |
Relationship between various classes of schedules:
Schedules ought to be serializable and strict.
But more serializable/strict ⇒ less concurrency.
DBMSs allow users to trade off "safety" against performance.
Transaction Isolation Levels | 55/146 |
Previous examples showed:
... Transaction Isolation Levels | 56/146 |
In many real applications, there is either
... Transaction Isolation Levels | 57/146 |
SQL provides a mechanism for database programmers to specify how much isolation to apply to transactions
set transaction read only-- so weaker isolation may be ok read write-- suggests stronger isolation needed isolation level-- weakest isolation, maximum concurrency read uncommitted read committed repeatable read serializable-- strongest isolation, minimum concurrency
... Transaction Isolation Levels | 58/146 |
Meaning of transaction isolation levels:
Isolation Dirty Nonrepeatable Phantom Level Read Read Read Read uncommitted Possible Possible Possible Read committed Not possible Possible Possible Repeatable read Not possible Not possible Possible Serializable Not possible Not possible Not possible
... Transaction Isolation Levels | 59/146 |
For transaction isolation, PostgreSQL
... Transaction Isolation Levels | 60/146 |
A PostgreSQL tx consists of a sequence of SQL statements:
BEGIN S1; S2; ... Sn; COMMIT;
Isolation levels affect view of DB provided to each Si:
... Transaction Isolation Levels | 61/146 |
Using PostgreSQL's serializable isolation level, a select
Implementing Concurrency Control |
Concurrency Control | 63/146 |
Approaches to concurrency control:
Synchronise transaction execution via locks on relevant part of DB.
Allow multiple consistent versions of the data to exist.
Each transaction has exclusive access to one version (the one when tx started).
Execute all transactions; check for validity problems just before commit.
Organise transaction execution via timestamps assigned to actions.
Lock-based Concurrency Control | 64/146 |
Locks introduce additional mechanisms in DBMS:
The Scheduler
... Lock-based Concurrency Control | 65/146 |
Lock table entries contain:
Lock upgrade:
... Lock-based Concurrency Control | 66/146 |
Synchronise access to shared data items via following rules:
... Lock-based Concurrency Control | 67/146 |
Consider the following schedule, using locks:
T1: Lr(Y) R(Y) U(Y) Lw(X) W(X) U(X) T2: Lr(X) R(X) U(X) Lw(Y)....W(Y) U(Y)
(where Lr = read-lock, Lw = write-lock, U = unlock)
Locks correctly ensure controlled access to shared objects (X
Y
Despite this, the schedule is not serializable.
Two-Phase Locking | 68/146 |
To guarantee serializability, we require an additional constraint on how locks are applied:
... Two-Phase Locking | 69/146 |
Consider the following two transactions:
T1: Lw(A) R(A) F1 W(A) Lw(B) U(A) R(B) G1 W(B) U(B) T2: Lw(A) R(A) F2 W(A) Lw(B) U(A) R(B) G2 W(B) U(B)
They follow 2PL protocol, inducing a schedule like:
T1(a): Lw(A) R(A) F1 W(A) Lw(B) U(A) T2(a): Lw(A) ...................... R(A) F2 W(A) T1(b): R(B) G1 W(B) U(B) T2(b): Lw(B) ............ U(A) R(B) G2 W(B) U(B)
Problems with Locking | 70/146 |
Appropriate locking can guarantee correctness.
However, it also introduces potential undesirable effects:
No transactions can proceed; each waiting on lock held by another.
One transaction is permanently "frozen out" of access to data.
Locking introduces delays while waiting for locks to be released.
Deadlock | 71/146 |
Deadlock occurs when two transactions are waiting for a lock on an item held by the other.
Example:
T1: Lw(A) R(A) Lw(B) ...... T2: Lw(B) R(B) Lw(A) .....
How to deal with deadlock?
... Deadlock | 72/146 |
Handling deadlock involves forcing a transaction to "back off".
... Deadlock | 73/146 |
Simple approach: timeout
... Deadlock | 74/146 |
A cycle in the waits-for graph indicates a deadlock.
Could prevent deadlock by
... Deadlock | 75/146 |
Alternative deadlock handling: timestamps.
Each tx is permanently allocated a unique timestamp (e.g. start-time).
When Tj tries to get lock held by Tk
... Deadlock | 76/146 |
Tj tries to get lock held by Tk ...
Wait-die scheme:
... Deadlock | 77/146 |
Properties of deadlock handling methods:
Starvation | 78/146 |
Starvation occurs when one transaction
Multiple locks ⇒ need to decide which to release first.
Solutions:
Locking Granularity | 79/146 |
Locking typically reduces concurrency ⇒ reduces throughput.
Granularity of locking can impact performance:
+ lock a small item ⇒ more of database accessible
+ lock a small item ⇒ quick update ⇒ quick lock release
- lock small items ⇒ more locks ⇒ more lock management
Granularity levels: field, row (tuple), table, whole database
... Locking Granularity | 80/146 |
Multiple-granularity locking protocol:
Locking in B-trees | 81/146 |
How to lock a B-tree leaf node?
One possibility:
If for insert/delete, locks would be write locks.
This approach gives poor performance (lock on root is bottleneck).
... Locking in B-trees | 82/146 |
Approach for searching (select) ...
Optimistic Concurrency Control | 83/146 |
Locking is a pessimistic approach to concurrency control:
In systems where read:write ratio is very high
... Optimistic Concurrency Control | 84/146 |
Transactions have three distinct phases:
... Optimistic Concurrency Control | 85/146 |
Data structures needed for validation:
... Optimistic Concurrency Control | 86/146 |
Validation check for transaction T
What this prevents ...
... Optimistic Concurrency Control | 87/146 |
Problems with optimistic concurrency control:
Multi-version Concurrency Control | 88/146 |
Multi-version concurrency control (MVCC) aims to
... Multi-version Concurrency Control | 89/146 |
Each transaction is
Each record in the database is
... Multi-version Concurrency Control | 90/146 |
When a reader Ti is accessing the database
Concurrency Control in SQL | 91/146 |
Transactions in SQL are specified by
BEGIN
COMMIT
ROLLBACK
... Concurrency Control in SQL | 92/146 |
More fine-grained control of "undo" via savepoints:
SAVEPOINT
ROLLBACK TO SAVEPOINT
begin; insert into numbersTable values (1); savepoint my_savepoint; insert into numbersTable values (2); rollback to savepoint my_savepoint; insert into numbersTable values (3); commit;
will insert 1 and 3 into the table, but not 2.
... Concurrency Control in SQL | 93/146 |
SQL standard defines four levels of transaction isolation.
PostgreSQL implements: repeatable-read = serializable, read-uncommitted = read-committed
... Concurrency Control in SQL | 94/146 |
Using the serializable isolation level, a select
... Concurrency Control in SQL | 95/146 |
Explicit control of concurrent access is available, e.g.
Table-level locking: LOCK TABLE
ALTER TABLE
Row-level locking: SELECT FOR UPDATE
DELETE
Concurrency Control in PostgreSQL | 96/146 |
PostgreSQL uses two styles of concurrency control:
select
create table
LOCK
... Concurrency Control in PostgreSQL | 97/146 |
Implementing MVCC in PostgreSQL requires:
xmin
xmax
xnew
... Concurrency Control in PostgreSQL | 98/146 |
Rules for a tuple to be visible to Tj:
xmin
xmax
... Concurrency Control in PostgreSQL | 99/146 |
A transaction sees a consistent view of the database, but may not see the "current" view of the database.
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples
This is OK unless tx's communicate outside the database system.
For applications that require that every transaction accesses the current consistent version of the data, explicit locks are available.
Locks are available at various granluarities:
LOCK TABLE
SELECT FOR UPDATE
Implementing Atomicity/Durability |
Atomicity/Durability | 101/146 |
Reminder:
Transactions are atomic
Durability | 102/146 |
What kinds of "system failures" do we need to deal with?
postgres
... Durability | 103/146 |
Consider following scenario:
Desired behaviour after system restart:
... Durability | 104/146 |
Durabilty begins with a stable disk storage subsystem.
Operations:
putBlock(Buffer *b)
getBlock(Buffer *b)
putBlock
getBlock
... Durability | 105/146 |
Implementation of transaction operations R(V) and W(V)
Value R(Object V) { B = getBuf(blockContaining(V)) return value of V from B } void W(Object V, value k) { B = getBuf(blockContaining(V)) set value of V in B }
Note:
putBuf()
... Durability | 106/146 |
getBuf()
putBuf()
Buffer getBuf(BlockAddr A) { if (!inBufferPool(A)) { B = availableBuffer(A); getBlock(B); } return bufferContaining(A); } void putBuf(BlockAddr A) { B = bufferContaining(A); putBlock(B); }
Stable Store | 107/146 |
One simple strategy using redundancy: stable store.
Protects against all failures in a single disk sector.
Each logical disk page X is stored twice.
(Obvious disadvantage: disk capacity is halved)
X is stored in sector S on disk L and sector T on disk R
Assume that a sound parity-check is available
(i.e. can always recognise whether data has transferred mem↔disk correctly)
... Stable Store | 108/146 |
Low level sector i/o functions:
int writeSector(char *b, Disk d, Sector s) { int nTries = 0; do { nTries++; write(d,s,b); } while (bad(parity) && nTries < MaxTries) return nTries; } int readSector(char *b, Disk d, Sector s) { int nTries = 0; do { nTries++; read(d,s,b); } while (bad(parity) && nTries < MaxTries) return nTries; }
... Stable Store | 109/146 |
Writing data to disk with stable store:
int writeBlock(Buffer *b, Disk d, Sector s) { int sec; for (;;) { sec = (s > 0) ? s : getUnusedSector(d); n = writeSector(b->data, d, sec); if (n == maxTries) mark s as unusable else return sec; } }
writeBlock
... Stable Store | 110/146 |
Reading data from disk with stable store:
int readBlock(Buffer *b) { int n = readSector(b->data, b->diskL, b->sectorL); if (n == maxTries) { n = readSector(b->data, b->diskR, b->sectorR); if (n == maxTries) return -1;// read failure } return 0;// successful read }
... Stable Store | 111/146 |
Consider scenario where power fails during write operation:
writeBlock(X,L,S); writeBlock(X,R,T);
... Stable Store | 112/146 |
How stable storage handles failure during writing:
RAID | 113/146 |
RAID gives techniques to achieve
Stable Storage Subsystem | 114/146 |
We can prevent/minimise loss/corruption of data due to:
Dealing with Transactions | 115/146 |
The remaining "failure modes" that we need to consider:
ABORT
Architecture for Atomicity/Durability | 116/146 |
How does a DBMS provide for atomicity/durability?
Execution of Transactions | 117/146 |
Transactions deal with three address spaces:
... Execution of Transactions | 118/146 |
Operations available for data transfer:
INPUT(X)
X
READ(X,v)
X
v
WRITE(X,v)
v
X
OUTPUT(X)
X
READ/WRITE
INPUT/OUTPUT
... Execution of Transactions | 119/146 |
Example of transaction execution:
-- implements A = A*2; B = B+1; BEGIN READ(A,v); v = v*2; WRITE(A,v); READ(B,v); v = v+1; WRITE(B,v); COMMIT
READ
INPUT
COMMIT
... Execution of Transactions | 120/146 |
States as the transaction executes:
t Action v Buf(A) Buf(B) Disk(A) Disk(B) ----------------------------------------------------- (0) BEGIN . . . 8 5 (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 (7) OUTPUT(A) 6 16 6 16 5 (8) OUTPUT(B) 6 16 6 16 6
After tx completes, we must have either
Disk(A)=8,Disk(B)=5 or Disk(A)=16,Disk(B)=6
If system crashes before (8), may need to undo disk changes.
If system crashes after (8), may need to redo disk changes.
Transactions and Buffer Pool | 121/146 |
Two issues arise w.r.t. buffers:
OUTPUT
WRITE
... Transactions and Buffer Pool | 122/146 |
Handling stealing:
Logging | 123/146 |
Three "styles" of logging
Undo Logging | 124/146 |
Simple form of logging which ensures atomicity.
Log file consists of a sequence of small records:
<START T>
T
<COMMIT T>
T
<ABORT T>
T
<T,X,v>
T
X
v
WRITE
OUTPUT
... Undo Logging | 125/146 |
Data must be written to disk in the following order:
<T,X,v>
X
X
... Undo Logging | 126/146 |
For the example transaction, we would get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,8> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,5> (7) FlushLog (8) StartCommit (9) OUTPUT(A) 6 16 6 16 5 (10) OUTPUT(B) 6 16 6 16 6 (11) EndCommit <COMMIT T> (12) FlushLog
Note that T is not regarded as committed until (11).
... Undo Logging | 127/146 |
Simplified view of recovery using UNDO logging:
committedTrans = abortedTrans = startedTrans = {} for each log record from most recent to oldest { switch (log record) { <COMMIT T> : add T to committedTrans <ABORT T> : add T to abortedTrans <START T> : add T to startedTrans <T,X,v> : if (T in committedTrans)// don't undo committed changes else { WRITE(X,v); OUTPUT(X) } } } for each T in startedTrans { if (T in committedTrans) ignore else if (T in abortedTrans) ignore else write <ABORT T> to log } flush log
... Undo Logging | 128/146 |
Recall example transaction and consider effects of system crash at the following points:
Before (9) ... disk "restored" (unchanged); <ABORT T>
(9)-(11) ... disk restored to original state; <ABORT T>
After (12) ... A and B left unchanged; T
"Disk restored" means
WRITE(B,5); OUTPUT(B); WRITE(A,8); OUTPUT(A);
Checkpointing | 129/146 |
Previous view of recovery implied reading entire log file.
Since log file grows without bound, this is infeasible.
Eventually we can delete "old" section of log.
... Checkpointing | 130/146 |
Problem: many concurrent/overlapping transactions.
How to know that all have finished?
Simplistic approach
<COMMIT T>
<ABORT T>
<CHKPT>
... Checkpointing | 131/146 |
Obvious problem with quiescent checkpointing
<CHKPT (T1,..,Tk)>
T1,..,Tk
<ENDCHKPT>
... Checkpointing | 132/146 |
Recovery: scan backwards through log file processing as before.
Determining where to stop depends on ...
If we encounter <ENDCHKPT>
<CHKPT...>
<CHKPT...>
<CHKPT (T1,..,Tk)>
T1,..,Tk
Redo Logging | 133/146 |
Problem with UNDO logging:
... Redo Logging | 134/146 |
Requirement for redo logging: write-ahead rule.
Data must be written to disk as follows:
<T,X,v'>
v'
X
... Redo Logging | 135/146 |
For the example transaction, we would get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,16> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,6> (7) COMMIT <COMMIT T> (8) FlushLog (9) OUTPUT(A) 6 16 6 16 5 (10) OUTPUT(B) 6 16 6 16 6
Note that T is regarded as committed as soon as (8) completes.
... Redo Logging | 136/146 |
Simplified view of recovery using REDO logging:
set committedTrans// e.g. from tx table for each log record from oldest to most recent { switch (log record) { <COMMIT T> : ignore// know these already <ABORT T> : add T to abortedTrans <START T> : add T to startedTrans <T,X,v'> : if (T in committedTrans) { WRITE(X,v'); OUTPUT(X) } else// nothing to do, no change on disk } } for each T in startedTrans { if (T in committedTrans) ignore else if (T in abortedTrans) ignore else write <ABORT T> to log } flush log
... Redo Logging | 137/146 |
Data required for REDO logging checkpoints:
... Redo Logging | 138/146 |
Checkpoints in REDO logging use (as before):
<CHKPT (T1,..,Tk)>
<ENDCHKPT>
<CHKPT (T1,..,Tk)>
<CHKPT...>
<ENDCHKPT>
... Redo Logging | 139/146 |
Recovery with checkpointed REDO log.
As for UNDO logging, two cases ...
Last checkpoint record before crash is <ENDCHKPT>
<CHKPT...>
T1,..,Tk
<CHKPT...>
<CHKPT (T1,..,Tk)>
<ENDCHKPT>
Undo/Redo Logging | 140/146 |
UNDO logging and REDO logging are incompatible in
<COMMIT T>
<T,X,v,v'>
X
... Undo/Redo Logging | 141/146 |
Requirement for undo/redo logging: write-ahead.
Data must be written to disk as follows:
<COMMIT T>
... Undo/Redo Logging | 142/146 |
For the example transaction, we might get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,8,16> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,5,6> (7) FlushLog (8) StartCommit (9) OUTPUT(A) 6 16 6 16 5 (10) <COMMIT T> (11) OUTPUT(B) 6 16 6 16 6
Note that T is regarded as committed as soon as (10) completes.
... Undo/Redo Logging | 143/146 |
Recovery using undo/redo logging:
Before (10) ... treat as incomplete; undo all changes
After (10) ... treat as complete; redo all changes
... Undo/Redo Logging | 144/146 |
Steps in UNDO/REDO log checkpointing
<CHKPT (T1,..,Tk)>
<ENDCHKPT>
A consequence of the above:
COMMIT
... Undo/Redo Logging | 145/146 |
The above description simplifies details of undo/redo logging.
Aries is a complete algorithm for undo/redo logging.
Differences to what we have described:
<END>
<CHKPT>
<ENDCHKPT..>
Recovery in PostgreSQL | 146/146 |
PostgreSQL uses write-ahead undo/redo style logging.
However, it also uses multi-version concurrency control
Core transaction code is in src/backend/access/transam
Produced: 15 May 2016