Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for matching and extraction, feature learning, etc. However, such suffer when some component does not perform well, which leads error cascading poor overall performance. Furthermore, the majority of existing ignore ans...