Automatic Text Summarization of COVID-19 Research Articles Using Recurrent Neural Networks and Coreference Resolution
Abstract
Purpose: Pandemic COVID-19 has created an emergency for the medical community. Researchers require extensive study of scientific literature in order to discover drugs and vaccines. In this situation where every minute is valuable to save the lives of hundreds of people, a quick understanding of scientific articles will help the medical community. Automatic text summarization makes this possible.
Materials and Methods: In this study, a recurrent neural network-based extractive summarization is proposed. The extractive method identifies the informative parts of the text. Recurrent neural network is very powerful for analyzing sequences such as text. The proposed method has three phases: sentence encoding, sentence ranking, and summary generation. To improve the performance of the summarization system, a coreference resolution procedure is used. Coreference resolution identifies the mentions in the text that refer to the same entity in the real world. This procedure helps to summarization process by discovering the central subject of the text.
Results: The proposed method is evaluated on the COVID-19 research articles extracted from the CORD-19 dataset. The results show that the combination of using recurrent neural network and coreference resolution embedding vectors improves the performance of the summarization system. The Proposed method by achieving the value of ROUGE1-recall 0.53 demonstrates the improvement of summarization performance by using coreference resolution embedding vectors in the RNN-based summarization system.
Conclusion: In this study, coreference information is stored in the form of coreference embedding vectors. Jointly use of recurrent neural network and coreference resolution results in an efficient summarization system.
2- A. Spinelli and G. Pellino, "COVID‐19 pandemic: perspectives on an unfolding crisis," The British journal of surgery, 2020.
3- B.P. Joshi, V.D. Bakrola, P. Shah, et al., "deepMINE-Natural Language Processing based Automatic Literature Mining and Research Summarization for Early Stage Comprehension in Pandemic Situations specifically for COVID-19," bioRxiv, 2020.
4- M. Yousefi-Azar and L. Hamey, "Text summarization using unsupervised deep learning," Expert Systems with Applications, 68, p. 93-105, 2017.
5- K. Duraiswamy, "An approach for text summarization using deep learning algorithm," 2014.
6- Z. Cao, W. Li, S. Li, et al., "Attsum: Joint learning of focusing and summarization with neural attention," arXiv preprint arXiv:1604.00125, 2016.
7- Y. Zhang, M.J. Er, R. Zhao, et al., "Multiview convolutional neural networks for multidocument extractive summarization," IEEE transactions on cybernetics, 47(10), p. 3230-3242, 2016.
8- R. Nallapati, F. Zhai and B. Zhou, "Summarunner: A recurrent neural network based sequence model for extractive summarization of documents," in Thirty-First AAAI Conference on Artificial Intelligence. of Conference., Year.
9- P. Dhakras and M. Shrivastava, "BoWLer: A neural approach to extractive text summarization," in Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. of Conference., Year.
10- A. Rezaei, S. Dami and P. Daneshjoo, "Multi-Document Extractive Text Summarization via Deep Learning Approach," in 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI). of Conference.: IEEE, Year.
11- D. Su, Y. Xu, T. Yu, et al., "CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research," arXiv preprint arXiv:2005.03975, 2020.
12- L. Dong, N. Yang, W. Wang, et al., "Unified language model pre-training for natural language understanding and generation," in Advances in Neural Information Processing Systems. of Conference., Year.
13- M. Lewis, Y. Liu, N. Goyal, et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv preprint arXiv:1910.13461, 2019.
14- V. Kieuvongngam, B. Tan and Y. Niu, "Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2," arXiv preprint arXiv:2006.01997, 2020.
15- J. Devlin, M.-W. Chang, K. Lee, et al., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
16- A. Radford, J. Wu, R. Child, et al., "Language models are unsupervised multitask learners," OpenAI Blog, 1(8), p. 9, 2019.
17- J.W. Park, "Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature," arXiv preprint arXiv:2007.03405, 2020.
18- C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out. of Conference., Year.
19- L.L. Wang, K. Lo, Y. Chandrasekhar, et al., "CORD-19: The Covid-19 Open Research Dataset," ArXiv, 2020.
20- O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems. of Conference., Year.
21- Y. Bengio, R. Ducharme, P. Vincent, et al., "A neural probabilistic language model," Journal of machine learning research, 3(Feb), p. 1137-1155, 2003.
22- J. Pennington, R. Socher and C. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). of Conference., Year.
23- T. Wolf, J. Ravenscroft, J. Chaumond, et al., Neuralcoref: Coreference resolution in spacy with neural networks. 2018.
24- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, 9(8), p. 1735-1780, 1997.
25- J. Chung, C. Gulcehre, K. Cho, et al., "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
26- S.-M.-R. Beheshti, B. Benatallah, S. Venugopal, et al., "Asystematic review and comparative analysis of cross- document coreference resolution methods and tools," computing, p. 1-37, 2016.
27- S.-M.-R. Beheshti, S. Venugopal, S.H. Ryu, et al., Big data and cross-document coreference resolution: current state and future opportunities. 2013, the university of new south wales: Sydney, Australia.
28- J. Steinberger, M. Kabadjov and M. Poesio, Coreference applications to summarization, in Anaphora Resolution. 2016, Springer. p. 433-456.
29- R. Pascanu, T. Mikolov and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning. of Conference., Year.
30- D.P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
31- Y. Bengio, P. Simard and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, 5(2), p. 157-166, 1994.
32- L. Wang and A. Wong, "COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images," arXiv preprint arXiv:2003.09871, 2020.
Files | ||
Issue | Vol 7 No 4 (2020) | |
Section | Original Article(s) | |
DOI | https://doi.org/10.18502/fbt.v7i4.5321 | |
Keywords | ||
Extractive Summarization Coreference Resolution COVID-19 Recurrent Neural Network Long Short Term Memory Gated Recurrent Unit |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |