School of Science and Technology 科技學院
Computing Programmes 電腦學系

AI-Powered Question Generator

Jason Chun Wai KWOK

  
ProgrammeBachelor of Computing with Honours in Internet Technology
SupervisorDr. Keith Lee
AreasIntelligent Applications
Year of Completion2020
Award2nd Runner Up, IEEE (HK) Computational Intelligence Chapter 17th FYP Competition

Objectives

The aim of this project is to develop a system that can automatically generate questions from text using AI technology. This system can generate wh questions and grammar questions, and it is mainly designed for educational purposes. For example, helping teachers to generate quiz or assignment questions for their students and helping parents to generate practice questions for their children. It can also be used to generate questions for building a reading comprehension dataset, this kind of dataset is widely used in the studies of Natural Language Processing (NLP).

The proposed automated question generation system's objectives are fourfold:

  • Develop a subsystem that can generate wh-questions
  • Develop a subsystem that can generate grammar questions
  • Integrate the subsystems into a web backend
  • Build an easy to use web user interface

Video Demonstration

Background and Methodology

The key technology of the system is the Transformer encoder-decoder model, T5. It is the major component used for generating wh questions. The T5 is also using a pre-training and fine-tuning pattern just like many other Transformer-based models, therefore it can benefit from the transfer learning.

Figure 1: T5 framework showing the input and output in machine translation, classification, and text summarization tasks

As mentioned before, question answering is the sibling task of question generation, therefore the dataset for question answering can also be used for training question generation model. In our approach, we used the SQuAD 2.0 dataset (Rajpurkar et al. 2018) to fine-tune T5. SQuAD 2.0 is a reading comprehension dataset consisting of questions, answers, and articles, it is also a benchmark dataset for question answering.

Another key technology of this project is React, it is one of the most popular Javascript libraries for building user interfaces, we used it to build our single page application (SPA) user interface.

Other technologies used in this project include part-of-speech(POS) tagging and named-entity recognition(NER). These are the language processing tools that help to extract linguistic features from the text.

System Architecture

In this project, the major component is a Transformer-based model called T5: Text-To-Text Transfer Transformer, it is a Transformer encoder-decoder model similar to the original Transformer architecture. Although there are some Transformer-based approaches, no one had ever used T5 for question generation. Combined with other language processing components such as part-of-speech(POS) tagging and named-entity recognition(NER) tagging, the system is able to generate wh questions and multiple-choice grammar questions. The system user can control the generated questions by selecting the answers of the questions. The whole system will be put on a web server, and become a web application that is accessible over the browser. The user interface is a single page application written in React, there are some additional functions added in the user interface.

Figure 2: The overall design of the system

Figure 3: A higher-level overview of the system

System Design and Implementation

To set up the prototype, it needs to host the web server. It is a Python web server, therefore the host needs to install Python into the computer and install all the required libraries as well. If there is already a web server running, other users can just use a browser and enter the server URL to use the web application.

Figure 4: The user interface

The navigation bar shows 3 steps of the system: input text, select keywords, and results, they are clickable to switch between pages. Because it is a single page application, there will be no full page loading on user interaction. After inputting a text, click on the “Proceed” button to the next step.

Figure 5: The user interface of the second step: select keywords

Then the user receives the auto-tagged text based on the NER tagging. These keywords will be the answers for generating questions. The user can control what questions to be generated by adding or deleting keywords. By clicking the red cross button right next to the keywords to delete it, or click the “Remove all” button to remove all keywords. To add a new keyword, the user can highlight any words, then a green add button will appear next to the highlighted words. After that, click on the “Proceed” button to start generating questions.

Figure 6: The result page showing the generated wh-questions

Figure 7: When the user clicks on the “Source” button near the question to see the source of the question, and the answer of that question is highlighted

Figure 8: The user clicks on any question and edits it so that the user does not need to copy the question to other places to edit. The user can also click on the “Delete” button to delete the whole question, the question number will be adjusted automatically.

Figure 9: The user can click on the “Show Answer” checkbox to show or hide all the answers. And click on the “Copy to Clipboard” button can copy all the questions to the user's clipboard.

Figure 10: The result page showing the generated grammar questions.

The system will generate one question for each input sentence. The “Shuffle” button is used to shuffle the order of the choices. Similar to the wh questions page, the user can choose to show or hide answers, and copy the questions to their clipboard.

Evaluation

Figure 11: Result of the survey

Figure 12: Result of the survey

Table 1. The result of the response time test

The result shows that most of the respondents are satisfied with our system in both usefulness, usability, and look-and-feel. There are also some opinions related to the auto-selected keywords, the loading time, and the quality of the questions, it indicates that there is still room for improvement. We also conducted a question type test and response time test, these tests show some limitations and weaknesses of the system which is similar to the response of the survey, especially the keywords/answers selection process and the long processing time.

Conclusion and Future Development

The keywords/answers selection process that only relies on NER tagging can not select all possible answers and that affects the quality of the generated questions. Therefore we suggest improving this part in the future by using another selection approach and hence to reduce the user involvement in the generation process.

About the response time, as mentioned in the evaluation, it takes about 1 second for generating each question and it is not an ideal time. Therefore we suggest improving this area by optimizing the program or finding another approach to generate questions, such as the typical encoder-decoder solutions with some additional features, that should have a much faster processing speed because of the smaller size of the model.

We hope in the near future, the technology of question generation would be more powerful that can actually help teachers to generate questions for their students so that the teachers can have more time to focus on teaching.

Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5