• No results found

Exploiting Asynchronous IO using the Asynchronous Iterator Model

N/A
N/A
Protected

Academic year: 2023

Share "Exploiting Asynchronous IO using the Asynchronous Iterator Model "

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Exploiting Asynchronous IO using the Asynchronous Iterator Model

Suresh Iyengar * S. Sudarshan Santosh Kumar

#

Raja Agrawal

&

IIT Bombay

(2)

Agenda

AIO Background

 Exploiting AIO in query processing

• Asynchronous Iterator model

 Asynchronous Index Nested Loops Join

 Asynchronous versions of other operators

 Performance results

 Related Work

 Conclusion

(3)

Application Kernel

Read ()

System call

Initiate IO Read response Context switch

Application Blocked !

IO Processing : Traditional way

data

(4)

Application Kernel

AIO Read ()

System call

Initiate IO

Read response

IO Processing : Async. way

Notify

Do other

data

work !!

(5)

IO Processing : Async. Way

Asynchronous approach

• Overlap of CPU and IO processing

• Application can generate multiple IO requests

 Allows IO subsystem to reorder access to data on disk

• Important in RAID environments

(6)

Asynchronous IO Interface

aio_read ( aio structure) Request an AIO read operation

aio_error ( aio structure ) Check the status of an AIO request lio_listio ( array of aio

structures )

Initiate a list of AIO operations

• We use list AIO in our implementation

• Can initiate multiple IO read operations in one system call

( File descriptor, offset, buffer, numBytes, … )

Linux 2.6 kernel

(7)

Handling AIO completion

 Signal-based handler

• A signal is generated on IO completion

 Callback using interrupts

• An interrupt is generated on IO completion

 Concurrent access to completion handler and shared data structures in both of above methods

 Polling

• Store IO requests in pending queue and poll periodically for completion

• Our experiments show polling beats signal/interrupt based approach

Call completion handler

(8)

Demand-Driven Iterator

 Bottom level nodes perform operations such as sequential scans or index scans.

 Upper level nodes are join nodes or other operator nodes such as sort or aggregate.

NL J scan

Table A Table B

• Open()

• Next()

• Close()

Blocking call ! scan

(9)

 AIO Background

Exploiting AIO in query processing

Asynchronous Iterator model

Asynchronous Index Nested Loop (INL) Joins

Asynchronous versions of other operators

 Performance results

 Related Work

 Conclusion

Agenda

(10)

NL J scan

Table A Table B scan

• Open()

• Next()

• Close()

Asynchronous Iterator

I don’t have the tuple available in the memory !!

• Issue AIO read operation

• Return “LATER”

Non- Blocking call !

(11)

Asynchronous Iterator Model (AIM)

 Allow a node to return a status “LATER” to the parent

• Instead of blocking for IO completion.

 The parent operator could

• Perform other work, such as fetching data from another input

• Simply return a LATER status to its parent node

• Or just loop, reinvoking the child operator till it returns a tuple

 E.g. root of the execution plan tree

 Exact action depends on operator

• Asynchronous versions of different operators

• Focus on Asynchronous Indexed Nested Loops join

(12)

Asynchronous INL Joins

 Original state of Indexed Nested Loops (INL) node

• Left and right subplans and qualifier lists

 Augmented state for async INL node

• An array of outer tuples each having a queue of matching inner TIDs

 AIO may have been issued for some already, others later

• A workqueue for outer slots which already have AIO issued for their matching inner TIDS

• An IO queue recording all pending AIO requests made by the node

 Used to poll for completion of AIO requests

(13)

Asynchronous INL Join (contd.)

 We divide the async INL join operations into two stages

• Stage 1: Fetch outer tuples and issues AIO requests

• Stage 2: Check for AIO completion, process AIO results and return join results.

 Stages are interleaved

• Stage 1 may be in progress for some tuples, and Stage 2

for others

(14)

Asynchronous INL Join (contd.)

Fetch outer tuples

Find the matching inner TIDs for each outer tuple

Put the outer tuple in workqueue

For each outer tuple

Issue LIST AIO for matching inner TIDS of all outer tuples in workqueue

Stage 1

(15)

Asynchronous INL Join (contd.)

 Rules

• Batch size

 BATCH_SIZE: max number of outstanding AIO requests

 Why? OS limits, efficiency issues

 We set the MAX_BATCH_SIZE per node to 200 in our experiments

 Scale BATCH_SIZE in powers of 2 till MAX_BATCH_SIZE so that async INL can output tuples quickly at the onset

• Case where outer tuple matches a large number of inner tuples is handled appropriately

• Keeping the AIO queue filled

 We issue further AIO requests (fetching outer tuples as

required) if 10 % of earlier AIO requests have completed

(16)

Asynchronous INL Join (contd.)

For each outer tuple in workqueue

Stage 2

• Remove that inner TID from outer tuple’s TID array

• Perform join and add to result

• if join result found break from loop Check if any matching inner

TIDs are present in memory

Present ? Yes

Update workqueue Next page ..

No

(17)

Asynchronous INL Join (contd.)

Any join results?

Poll for AIO completion

Is tuple found or parent node cannot handle LATER

Is no outstanding outer tuples &

reached end of outer tuple

No

Yes

tupStat = END_OF_RESULT

result = NULL tupstat = LATER

result = NULL

Back to start of Stage 2

No

Yes Prev page..

Yes

No

Return result to parent node

(18)

Async. versions of other operators

 Async Sequential scan

• Check if next tuple is in the in-memory buffer

• If its present, return the tuple

• Else initiate an async read. Set tupStat = LATER and return

 Out of order sequential scan

• Start returning the tuples of a particular relation which are already there in the memory

 even if out of order

• Concurrently, issue AIO for other tuples

(19)

Async. versions of other operators

Merge Join

sort sort

Seq scan Seq scan

T1 T2

I can start the sorting of other input !

LATER

LATER

Initiate AIO read

(20)

Performance Results

 Experiments with TPC-H database with scale factors of 1 and 10 in three different setups

• Core 2 duo P4 with:

 1GB RAM and TPC-H - 1 GB database (single disk)

 1GB RAM and TPC-H – 10 GB database (single disk)

 3.2GB RAM and TPC-H – 10 GB database (4 disks / RAID 10)

 We use PostgreSQL 8.1.3 as the code base

 Compare it with our modified version of the same code base, incorporating asynchronous iterator model

• with async INL and async seq. scan

(21)

Performance Results: 1GB RAM

Query 1a: select l_orderkey, l_quantity from orders, lineitem

where o_orderkey=l_orderkey and l_orderkey%100=2 and l_linestatus=’F’

TPCH 1 GB

TPCH 10 GB

(22)

Performance Results: 1 GB RAM

Query 2a: select l_orderkey,l_quantity from orders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey

and l_orderkey%100=2 and l_linestatus=’F’

TPCH 1 GB

TPCH 10 GB

(23)

Performance Results : 1GB RAM

Query 2a : Join of orders, lineitem and customer with filter (TPCH 1GB )

Startup effect

(24)

Performance Results: 1 GB RAM

Query 2b: select l_orderkey,l_quantity from myorders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey

TPCH

1 GB TPCH

10 GB

-- No tight selection

(25)

Performance Results: 3.2 GB + RAID

Query 1a : Join of orders and lineitem with filter

Query 2a : Join of orders, lineitem and customer with filter

(26)

Performance Results: 3.2 GB + RAID

Query 1b : Join of myorders, lineitem Query 2b : Join of myorders, lineitem and customer

(27)

Performance Results

TPC-H Q12: select l_shipmode,sum(...) from orders,lineitem

where o_orderkey = l_orderkey and <several selection>

group by l_shipmode order by l_shipmode

Original INL Async INL Gain TPCH 1GB

1GB RAM

64.7 sec 48 sec 25 %

TPCH 10 GB 1GB RAM

687 sec 431 sec 37 %

TPCD 10GB RAID 10

4 disks, 3.2 GB RAM

164 sec 147 sec 10 %

(28)

Related Work

 Graefe’s generalized spool iterator (Graefe [ BTW03 ])

INL

Spool operator

scan Index

lookup

• Pre-fetches multiple outer tuples

• Issue AIO for matching inner TIDS

• Can be replenished

when empty or when

one tuple is joined

(29)

Related Work

 AIO used in database products

• Microsoft SQL Server, IBM DB2, Oracle

• No public documentation on how these systems use AIO

 Asynchronous iteration for evaluating web queries (R.Goldman and J. Widom [ SIGMOD 2000 ] )

• They report results only on web queries

(30)

Conclusion

 Proposed the Asynchronous Iterator Model (AIM)

 Presented asynchronous versions of INL and some operators

 Showed gains of over 50 % in some cases

 AIM can be useful in web-service access and in data integration systems like IBM DataJoiner

 Future work

• Implementing async versions for index lookup, sub plan, sort and merge operator

• Performing async IO in the presence of ordering

constraints

(31)

Thank You

Questions ?

(32)

Plans

 Query 1a :

• Seq scan on lineitem, probe on orders

• Merge Join

-> Index Scan on orders -> Sort lineitem

-> Seq Scan on lineitem

 Query 2a:

• Nested Loop

-> Nested Loop

-> Seq Scan on lineitem -> Index Scan on orders -> Index Scan on customer

(33)

Plans

 Query 2a Merge Join

-> Sort orders -> Merge Join

-> Index Scan on orders -> Sort on lineitem -> Seq Scan on lineitem

-> Index Scan on customer

 Query 2b : Nested Loop

-> Nested Loop

-> Seq Scan on lineitem

-> Index Scan on myorders

-> Index Scan on customer

References

Related documents

Formal Models for distributed and infinite-state systems I Distributed: Concurrent, asynchronous, communicating,..1. I Formal models: Mathematical description, graphical

l Asynchronous database update (via update task) l Application level locking.. Presentation.

I If you don’t like “state objects”, think of them as infinite discrete structures..

I If you don’t like “state objects”, think of them as infinite discrete structures!.

Suppose we have T inner iterations of gradient descent in each outer iteration and T outer outer iterations in our ADMM variant, Parameter Mixing was run with T inner iterations

Dur- ing execution of the inner plan, the rows buffered in a spool iterator in leaf mode are like another input table for the inner plan – in fact, “magic” query execution

 RVTs are partitioned on different key as compared to base table and view records will be on different storage server as compared to base records..

• No party starts round ρ+1 unless all parties have finished round ρ, hence the view is identical to the synchronous protocol.. • The privacy follows from the privacy of