Abstract In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt sequence labeling paradigm to solve a bundle such as POS-tagging, named entity recognition (NER), and chunking, classification like sentiment analysis. With rapid progress pre-trained models, recent years have witnessed ri...