Exploiting Asynchronous IO using the Asynchronous Iterator Model
Suresh Iyengar * S. Sudarshan Santosh Kumar
#Raja Agrawal
&IIT Bombay
Agenda
AIO Background
Exploiting AIO in query processing
• Asynchronous Iterator model
Asynchronous Index Nested Loops Join
Asynchronous versions of other operators
Performance results
Related Work
Conclusion
Application Kernel
Read ()
System call
Initiate IO Read response Context switch
Application Blocked !
IO Processing : Traditional way
data
Application Kernel
AIO Read ()
System call
Initiate IO
Read response
IO Processing : Async. way
Notify
Do other
datawork !!
IO Processing : Async. Way
Asynchronous approach
• Overlap of CPU and IO processing
• Application can generate multiple IO requests
Allows IO subsystem to reorder access to data on disk
• Important in RAID environments
Asynchronous IO Interface
aio_read ( aio structure) Request an AIO read operation
aio_error ( aio structure ) Check the status of an AIO request lio_listio ( array of aio
structures )
Initiate a list of AIO operations
• We use list AIO in our implementation
• Can initiate multiple IO read operations in one system call
( File descriptor, offset, buffer, numBytes, … )
Linux 2.6 kernel
Handling AIO completion
Signal-based handler
• A signal is generated on IO completion
Callback using interrupts
• An interrupt is generated on IO completion
Concurrent access to completion handler and shared data structures in both of above methods
Polling
• Store IO requests in pending queue and poll periodically for completion
• Our experiments show polling beats signal/interrupt based approach
Call completion handler
Demand-Driven Iterator
Bottom level nodes perform operations such as sequential scans or index scans.
Upper level nodes are join nodes or other operator nodes such as sort or aggregate.
NL J scan
Table A Table B
• Open()
• Next()
• Close()
Blocking call ! scan
AIO Background
Exploiting AIO in query processing
• Asynchronous Iterator model
Asynchronous Index Nested Loop (INL) Joins
Asynchronous versions of other operators
Performance results
Related Work
Conclusion
Agenda
NL J scan
Table A Table B scan
• Open()
• Next()
• Close()
Asynchronous Iterator
•
I don’t have the tuple available in the memory !!
• Issue AIO read operation
• Return “LATER”
Non- Blocking call !
Asynchronous Iterator Model (AIM)
Allow a node to return a status “LATER” to the parent
• Instead of blocking for IO completion.
The parent operator could
• Perform other work, such as fetching data from another input
• Simply return a LATER status to its parent node
• Or just loop, reinvoking the child operator till it returns a tuple
E.g. root of the execution plan tree
Exact action depends on operator
• Asynchronous versions of different operators
• Focus on Asynchronous Indexed Nested Loops join
Asynchronous INL Joins
Original state of Indexed Nested Loops (INL) node
• Left and right subplans and qualifier lists
Augmented state for async INL node
• An array of outer tuples each having a queue of matching inner TIDs
AIO may have been issued for some already, others later
• A workqueue for outer slots which already have AIO issued for their matching inner TIDS
• An IO queue recording all pending AIO requests made by the node
Used to poll for completion of AIO requests
Asynchronous INL Join (contd.)
We divide the async INL join operations into two stages
• Stage 1: Fetch outer tuples and issues AIO requests
• Stage 2: Check for AIO completion, process AIO results and return join results.
Stages are interleaved
• Stage 1 may be in progress for some tuples, and Stage 2
for others
Asynchronous INL Join (contd.)
Fetch outer tuples
Find the matching inner TIDs for each outer tuple
Put the outer tuple in workqueue
For each outer tuple
Issue LIST AIO for matching inner TIDS of all outer tuples in workqueue
Stage 1
Asynchronous INL Join (contd.)
Rules
• Batch size
BATCH_SIZE: max number of outstanding AIO requests
Why? OS limits, efficiency issues
We set the MAX_BATCH_SIZE per node to 200 in our experiments
Scale BATCH_SIZE in powers of 2 till MAX_BATCH_SIZE so that async INL can output tuples quickly at the onset
• Case where outer tuple matches a large number of inner tuples is handled appropriately
• Keeping the AIO queue filled
We issue further AIO requests (fetching outer tuples as
required) if 10 % of earlier AIO requests have completed
Asynchronous INL Join (contd.)
For each outer tuple in workqueue
Stage 2
• Remove that inner TID from outer tuple’s TID array
• Perform join and add to result
• if join result found break from loop Check if any matching inner
TIDs are present in memory
Present ? Yes
Update workqueue Next page ..
No
Asynchronous INL Join (contd.)
Any join results?
Poll for AIO completion
Is tuple found or parent node cannot handle LATER
Is no outstanding outer tuples &
reached end of outer tuple
No
Yes
tupStat = END_OF_RESULT
result = NULL tupstat = LATER
result = NULL
Back to start of Stage 2
No
Yes Prev page..
Yes
No
Return result to parent node
Async. versions of other operators
Async Sequential scan
• Check if next tuple is in the in-memory buffer
• If its present, return the tuple
• Else initiate an async read. Set tupStat = LATER and return
Out of order sequential scan
• Start returning the tuples of a particular relation which are already there in the memory
even if out of order
• Concurrently, issue AIO for other tuples
Async. versions of other operators
Merge Join
sort sort
Seq scan Seq scan
T1 T2
I can start the sorting of other input !
LATER
LATER
Initiate AIO read
Performance Results
Experiments with TPC-H database with scale factors of 1 and 10 in three different setups
• Core 2 duo P4 with:
1GB RAM and TPC-H - 1 GB database (single disk)
1GB RAM and TPC-H – 10 GB database (single disk)
3.2GB RAM and TPC-H – 10 GB database (4 disks / RAID 10)
We use PostgreSQL 8.1.3 as the code base
Compare it with our modified version of the same code base, incorporating asynchronous iterator model
• with async INL and async seq. scan
Performance Results: 1GB RAM
Query 1a: select l_orderkey, l_quantity from orders, lineitem
where o_orderkey=l_orderkey and l_orderkey%100=2 and l_linestatus=’F’
TPCH 1 GB
TPCH 10 GB
Performance Results: 1 GB RAM
Query 2a: select l_orderkey,l_quantity from orders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey
and l_orderkey%100=2 and l_linestatus=’F’
TPCH 1 GB
TPCH 10 GB
Performance Results : 1GB RAM
Query 2a : Join of orders, lineitem and customer with filter (TPCH 1GB )
Startup effect
Performance Results: 1 GB RAM
Query 2b: select l_orderkey,l_quantity from myorders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey
TPCH
1 GB TPCH
10 GB
-- No tight selection
Performance Results: 3.2 GB + RAID
Query 1a : Join of orders and lineitem with filter
Query 2a : Join of orders, lineitem and customer with filter
Performance Results: 3.2 GB + RAID
Query 1b : Join of myorders, lineitem Query 2b : Join of myorders, lineitem and customer
Performance Results
TPC-H Q12: select l_shipmode,sum(...) from orders,lineitem
where o_orderkey = l_orderkey and <several selection>
group by l_shipmode order by l_shipmode
Original INL Async INL Gain TPCH 1GB
1GB RAM
64.7 sec 48 sec 25 %
TPCH 10 GB 1GB RAM
687 sec 431 sec 37 %
TPCD 10GB RAID 10
4 disks, 3.2 GB RAM
164 sec 147 sec 10 %
Related Work
Graefe’s generalized spool iterator (Graefe [ BTW03 ])
INL
Spool operator
scan Index
lookup
• Pre-fetches multiple outer tuples
• Issue AIO for matching inner TIDS
• Can be replenished
when empty or when
one tuple is joined
Related Work
AIO used in database products
• Microsoft SQL Server, IBM DB2, Oracle
• No public documentation on how these systems use AIO
Asynchronous iteration for evaluating web queries (R.Goldman and J. Widom [ SIGMOD 2000 ] )
• They report results only on web queries
Conclusion
Proposed the Asynchronous Iterator Model (AIM)
Presented asynchronous versions of INL and some operators
Showed gains of over 50 % in some cases
AIM can be useful in web-service access and in data integration systems like IBM DataJoiner
Future work
• Implementing async versions for index lookup, sub plan, sort and merge operator
• Performing async IO in the presence of ordering
constraints
Thank You
Questions ?
Plans
Query 1a :
• Seq scan on lineitem, probe on orders
• Merge Join
-> Index Scan on orders -> Sort lineitem
-> Seq Scan on lineitem
Query 2a:
• Nested Loop
-> Nested Loop
-> Seq Scan on lineitem -> Index Scan on orders -> Index Scan on customer