Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents only retain textual parts. In this work, emphasize the importance of multimodal we propose Multimodal (MPMQA) task. For each question, MPMQA requires model not process but also provide answers. To support MP...