• No results found

Automatic question and answer generation

N/A
N/A
Protected

Academic year: 2022

Share "Automatic question and answer generation"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Automating reading comprehension by generating question and answer pairs

Vishwajeet Kumar1 Kireeti Boorla2 Ganesh Ramakrishnan2 Yuan-Fang Li3

1IITB-Monash Research Academy, India

2IIT Bombay, India

3Monash University, Australia

(2)

Automatic question and answer generation

A system to automatically generate questions and answers from text.

Some text

Sachin Tendulkar received theArjuna Awardin1994for his outstanding sporting achievement, the Rajiv Gandhi Khel Ratna award in1997...

Questions

1. When did Sachin Tendulkar received the Arjuna Award?

Ans: 1994

2. which award did sachin tendular received in 1994 for his outstanding sporting achievement?

Ans: Arjuna Award

3. when did Sachin tendulkar received the Rajiv Gandhi Khel Ratna Award?

Ans: 1997

(3)

Motivation

Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years...

How would someone tell that you have read this text?

2

(4)

Motivation

Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years...

How would someone tell that you have read this text?

(5)

Motivation

Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years...

How would someone tell that you have read this text?

2

(6)

Why is this problem Challenging?

• Question Must be Relevant to the Text

• Answer Must be Unambiguous

• Question must be challenging and well formed

(7)

Why is this problem Challenging?

• Question Must be Relevant to the Text

• Answer Must be Unambiguous

• Question must be challenging and well formed

3

(8)

Why is this problem Challenging?

• Question Must be Relevant to the Text

• Answer Must be Unambiguous

• Question must be challenging and well formed

(9)

Existing Work

Template Based [Mazidi and Nielsen, 2014, Mostow and Chen, 2009]

• Use crowd sourced templates such asWhat is X ?

Syntax Based [Heilman, 2011]

• Rules for declarative-to-interrogative sentence transformation

• Only syntax is considered not semantics.

• Rely heavily on NLP tools.

Vanilla Seq2Seq for Question Generation [Du et al., 2017]

• First approach towards question generation from text using neural network.

• Uses vanilla Seq2Seq model for question generation.

4

(10)

Some other related work

Generate question given a fact/triple from KB/Ontology.

Example:<Fires Creek, contained by, nantahala national forest>Which forest isFires Creekin?

Template based [Seyler et al., 2015]

• Assumption: Facts are present in Domain dependent knowledge base.

• Generates question using templates based on facts.

Factoid question generation using RNN [Serban et al., 2016]

• Propose generating factoid question generation from freebase triples(subject,relation,object).

• Embeds fact using KG embedding techniques such as TransE.

(11)

Some other related work

Generate question given a fact/triple from KB/Ontology.

Example:<Fires Creek, contained by, nantahala national forest>Which forest isFires Creekin?

Template based [Seyler et al., 2015]

• Assumption: Facts are present in Domain dependent knowledge base.

• Generates question using templates based on facts.

Factoid question generation using RNN [Serban et al., 2016]

• Propose generating factoid question generation from freebase triples(subject,relation,object).

• Embeds fact using KG embedding techniques such as TransE.

5

(12)

Some other related work

Generate question given a fact/triple from KB/Ontology.

Example:<Fires Creek, contained by, nantahala national forest>Which forest isFires Creekin?

Template based [Seyler et al., 2015]

• Assumption: Facts are present in Domain dependent knowledge base.

• Generates question using templates based on facts.

Factoid question generation using RNN [Serban et al., 2016]

• Propose generating factoid question generation from freebase triples(subject,relation,object).

• Embeds fact using KG embedding techniques such as TransE.

(13)

Limitations of previous approaches

• Mostly rule based or template based.

• Do not generate answer corresponding to the question.

• Overly simple set of linguistic features.

6

(14)

Limitations of previous approaches

• Mostly rule based or template based.

• Do not generate answer corresponding to the question.

• Overly simple set of linguistic features.

(15)

Limitations of previous approaches

• Mostly rule based or template based.

• Do not generate answer corresponding to the question.

• Overly simple set of linguistic features.

6

(16)

Our contribution

• Pointer network based method for automatic answer selection.

• Sequence to sequence model with attention and augmented with rich set of linguistic features and answer encoding

(17)

Our contribution

• Pointer network based method for automatic answer selection.

• Sequence to sequence model with attention and augmented with rich set of linguistic features and answer encoding

7

(18)

Our contribution

• Pointer network based method for automatic answer selection.

• Sequence to sequence model with attention and augmented with rich set of linguistic features and answer encoding

(19)

Our contribution

• Pointer network based method for automatic answer selection.

• Sequence to sequence model with attention and augmented with rich set of linguistic features and answer encoding

7

(20)

Our contribution

• Pointer network based method for automatic answer selection.

• Sequence to sequence model with attention and augmented with rich set of linguistic features and answer encoding

(21)

Automatic question and answer generation using seq2seq model with pointer network

Answer Selection Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

Donald Trump is the Current President of United States of America.

Donald Trump

Who is the current president of United States of America ?

0.3 0.4 0.5 0.6 0.8 0.7 0.9 0.1 ... .. ..

Thought Vector for the sentence

Figure 1:High level architecture of our question generation model

8

(22)

Named Entity Selection

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Sentence Encoder

• SentenceS= (w1,w2, ...,wn)is encoded using a 2-layer LSTM network into hidden statesH= (hs1,hs2, ...,hsn).

• For eachNE,NE= (ni, ...,nj), create representation(R)

=<hnemean>,

• Ris fed to MLP along with<hsn;hsmean;>to get probability of named entity being pivotal answera.

P(NEi|S) =softmax(Ri.W+B)

wherehsnis final state

hsmeanis the mean of all activations

hnemeanis mean of activations in NE span(hsi, ...,hsj)

aMost relevant answer to ask question about

(23)

Named Entity Selection

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• SentenceS= (w1,w2, ...,wn)is encoded using a 2-layer LSTM network into hidden statesH= (hs1,hs2, ...,hsn).

• For eachNE,NE= (ni, ...,nj), create representation(R)

=<hnemean>,

• Ris fed to MLP along with<hsn;hsmean;>to get probability of named entity being pivotal answera.

P(NEi|S) =softmax(Ri.W+B)

wherehsnis final state

hsmeanis the mean of all activations

hnemeanis mean of activations in NE span(hsi, ...,hsj)

aMost relevant answer to ask question about

9

(24)

Named Entity Selection

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Sentence Encoder

• SentenceS= (w1,w2, ...,wn)is encoded using a 2-layer LSTM network into hidden statesH= (hs1,hs2, ...,hsn).

• For eachNE,NE= (ni, ...,nj), create representation(R)

=<hnemean>,

• Ris fed to MLP along with<hsn;hsmean;>to get probability of named entity being pivotal answera.

P(NEi|S) =softmax(Ri.W+B)

wherehsnis final state

hsmeanis the mean of all activations

hnemeanis mean of activations in NE span(hsi, ...,hsj)

aMost relevant answer to ask question about

(25)

Named Entity Selection

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• SentenceS= (w1,w2, ...,wn)is encoded using a 2-layer LSTM network into hidden statesH= (hs1,hs2, ...,hsn).

• For eachNE,NE= (ni, ...,nj), create representation(R)

=<hnemean>,

• Ris fed to MLP along with<hsn;hsmean;>to get probability of named entity being pivotal answera.

P(NEi|S) =softmax(Ri.W+B)

wherehsnis final state

hsmeanis the mean of all activations

hnemeanis mean of activations in NE span(hsi, ...,hsj)

aMost relevant answer to ask question about

9

(26)

Answer selection using Pointer networks

Answer Selection

Named Entity Selection

Pointer network

Answer and Features Encoding

Sentence Encoder

• Given encoder hidden statesH= (h1,h2, . . . ,hn), the probability of generatingO=(o1,o2, . . . ,om)is : P(O|S) =

P(oi|o1,o2,o3, . . . ,oi1;H)

• Probability distribution is modeled as:

ui=vTtanh(WeHˆ+WdDi) (1) P(O|S) =softmax(ui) (2)

(27)

Answer selection using Pointer networks

Answer Selection

Named Entity Selection

Pointer network

Answer and Features Encoding

Question Decoder Sentence Encoder

• Given encoder hidden statesH= (h1,h2, . . . ,hn), the probability of generatingO=(o1,o2, . . . ,om)is : P(O|S) =

P(oi|o1,o2,o3, . . . ,oi1;H)

• Probability distribution is modeled as:

ui=vTtanh(WeHˆ+WdDi) (1) P(O|S) =softmax(ui) (2)

10

(28)

Donald Trump|

NNP|

PERSON|nsubj is|

VBZ|

O|

cop the|

DT|

O|

det

President|

NNP|

O|

root .|

.|

O|

punct

?|

.|

O|

punct Donald Trump|

NNP|

PERSON|nsubj Who|

WP|

O|

root is|

VBZ|

O|

Sentence: cop

Question:

Donald Trump is the President.

Who is Donald Trump ?

POS Tag and Dependency Label

(29)

Features and Answer Encoding

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• POS tag, Named Entity tag, Dependency label as linguistic features.

• Rich set of linguistic features help model learn better generalize transformation rules.

• Dependency label is the edge label connecting each word with the parent in the dependency tree.

12

(30)

Features and Answer Encoding

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Sentence Encoder

• POS tag, Named Entity tag, Dependency label as linguistic features.

• Rich set of linguistic features help model learn better generalize transformation rules.

• Dependency label is the edge label connecting each word with the parent in the dependency tree.

(31)

Features and Answer Encoding

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• POS tag, Named Entity tag, Dependency label as linguistic features.

• Rich set of linguistic features help model learn better generalize transformation rules.

• Dependency label is the edge label connecting each word with the parent in the dependency tree.

12

(32)

Sentence Encoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Sentence Encoder

• BiLSTM to capture both left context and the right context.

ˆ ht =f(−→

Wwt+−→ V−−→

ˆht1+−→ b),←−

hˆt=f(←− Wwt+←−

V←−− hˆt+1+←−

b) (3)

hˆt=g(Uht+c) =g(U[−→ hˆt,←−

hˆt] +c) (4)

hˆtis the thought vectorW,V,and URn×mare trainable parameters, wtRp×q×ris feature encoded word embedding at time step t.

(33)

Sentence Encoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• BiLSTM to capture both left context and the right context.

→hˆt =f(−→ Wwt+−→

V−−→

ˆht1+−→ b),←−

hˆt=f(←− Wwt+←−

V←−−

hˆt+1+←− b) (3)

hˆt=g(Uht+c) =g(U[−→ hˆt,←−

hˆt] +c) (4)

hˆtis the thought vectorW,V,and URn×mare trainable parameters, wtRp×q×ris feature encoded word embedding at time step t.

13

(34)

Sentence Encoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Sentence Encoder

• BiLSTM to capture both left context and the right context.

→hˆt =f(−→ Wwt+−→

V−−→

ˆht1+−→ b),←−

hˆt=f(←− Wwt+←−

V←−−

hˆt+1+←− b) (3)

hˆt=g(Uht+c) =g(U[−→ hˆt,←−

hˆt] +c) (4)

hˆtis the thought vectorW,V,and URn×mare trainable parameters, wtRp×q×ris feature encoded word embedding at time step t.

(35)

Question Decoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

a

• 2-layer LSTM network.

• Decoder:

P(Q|S;θ) =softmax(Ws(tanh(Wr[ht,ct] +b)) (5)

• Beam search with beam_size 3 to decode question.

• Suitably modified decoder integrated with an attention mechanism to handle rare word problem.

whereWsandWrare weight vectors andtanhis the activation function.

14

(36)

Question Decoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• 2-layer LSTM network.

• Decoder:

P(Q|S;θ) =softmax(Ws(tanh(Wr[ht,ct] +b)) (5)

• Beam search with beam_size 3 to decode question.

• Suitably modified decoder integrated with an attention mechanism to handle rare word problem.

whereWsandWrare weight vectors andtanhis the activation function.

14

(37)

Question Decoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

a

• 2-layer LSTM network.

• Decoder:

P(Q|S;θ) =softmax(Ws(tanh(Wr[ht,ct] +b)) (5)

• Beam search with beam_size 3 to decode question.

• Suitably modified decoder integrated with an attention mechanism to handle rare word problem.

whereWsandWrare weight vectors andtanhis the activation function.

14

(38)

Question Decoder

Answer Selection

Named Entity Selection

Pointer Network

Answer and Features Encoding

Question Decoder Sentence Encoder

• 2-layer LSTM network.

• Decoder:

P(Q|S;θ) =softmax(Ws(tanh(Wr[ht,ct] +b)) (5)

• Beam search with beam_size 3 to decode question.

• Suitably modified decoder integrated with an attention mechanism to handle rare word problem.

whereWsandWrare weight vectors andtanhis the activation function.

14

(39)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

Weh,Wshandbattare learnable model parameters.

Wvandbvare trainable parameter.

15

(40)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

(41)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

Weh,Wshandbattare learnable model parameters.

Wvandbvare trainable parameter.

15

(42)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

(43)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

Weh,Wshandbattare learnable model parameters.

Wvandbvare trainable parameter.

15

(44)

Attention Mechanism

Attention distribution:

eti=vttanh(Wehhi+Wshst+batt) (6)

at=softmax(et) (7)

ct = Σiatihi (8)

Probability distribution over vocabulary is:

Pvocab=sofmax(Wv[st,ct] +bv) (9)

Overall loss is calculated as:

LOSS= 1

TΣTt=0logPvocab(wordt) (10)

(45)

Human evaluation results

System p1(%) p2(%) p3(%)

QG [Du et al., 2017] 51.6 48 52.3

QG+F 59.6 57 64.6

QG+F+NE 57 52.6 67

QG+GAE 44 35.3 50.6

QG+F+AES 51 47.3 55.3

QG F AEB 61 60.6 71.3

QG F GAE 63 61 67

Table 1:Human evaluation results onSte. Parameters are, p1: percentage of syntactically correct questions,p2:

percentage of semantically correct questions,p3:

percentage of relevant questions.

F Features

NE Named entity selection AES Sequence pointer network AEB Boundary pointer network GAE Ground truth answer en-

coding

bluedifferent alternatives for encoding the pivotal answer.

greenset of linguistic features that can be optionally added to any model. 16

(46)

Automatic evaluation results

Model BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE-L QG [Du et al., 2017] 39.97 22.39 14.39 9.64 14.34 37.04

QG+F 41.89 24.37 15.92 10.74 15.854 37.762

QG+F+NE 41.54 23.77 15.32 10.24 15.906 36.465

QG+GAE 43.35 24.06 14.85 9.40 15.65 37.84

QG F AES 43.54 25.69 17.07 11.83 16.71 38.22

QG F AEB 42.98 25.65 17.19 12.07 16.72 38.50

QG F GAE 46.32 28.81 19.67 13.85 18.51 41.75

bluedifferent alternatives for encoding the pivotal answer.

(47)

Some sample questions generated

18

(48)

Conclusion

• We introduced a novel two-stage process to generate question-answer pairs from text.

• We proposed an automatic answer selection technique using pointer network.

• We incorporate attention mechanism to the decoder to handle rare word problem.

(49)

Questions?

19

(50)

References I

Du, X., Shao, J., and Cardie, C. (2017).

Learning to ask: Neural question generation for reading comprehension.

InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1342–1352.

Heilman, M. (2011).

Automatic factual question generation from text.

PhD thesis, Carnegie Mellon University.

Mazidi, K. and Nielsen, R. D. (2014).

Linguistic considerations in automatic question generation.

InACL (2), pages 321–326.

(51)

References II

Mostow, J. and Chen, W. (2009).

Generating instruction automatically for the reading strategy of self-questioning.

InAIED, pages 465–472.

Serban, I. V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., and Bengio, Y. (2016).

Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus.

InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 588–598.

Seyler, D., Berberich, K., and Weikum, G. (2015).

Question Generation from Knowledge Graphs.

PhD thesis, Universität des Saarlandes Saarbrücken.

21

References

Related documents

8.54 Types of Failure in Riveted Joint (a) Shear Failure of Rivet (b) Tensile Failure of Plate between Rivets (c) Crushing Failure of Plate by Rivet (d) Shear Failure of Plate..

On the other hand the ratio of two consecutive Fibonacci numbers converges to Golden mean, or Golden section, ϕ = 1+ 2 √ 5 , which appers in modern research, particularly physics of

(A total of seven questions are to be set and the student has to answer 5 (five) questions. All questions carry equal marks. The first question which is compulsory carries 17 marks.

However, in the case of the linked answer question pair, there will be negative marks only for wrong answer to the first question and no negative marks for wrong answer to the

1- lqLi&#34;V dsanzd ugha ik;k tkrk gSA lqLi&#34;V dsUnzd ik;k tkrk gSA 2- buds dsUnzd ds pkjksa vksj f&gt;YYkh u gksus. ds dkj.k vuqokaf’kd inkFkZ

[r]

[ksrh ls &amp; [ksrksa esa Mkys tkus okys dhVuk’kh] [kjirokjuk’kh nokbZ;ksa] jklk;fud [kkn ls ty iznwf&#34;kr gksrk

[r]