Text-based technologies, such as text translation from one language to another, and image captioning, are gaining popularity. However, approximately half of the world's languages estimated be lacking a commonly used written form. Consequently, these cannot benefit text-based technologies. This paper presents 1) new speech technology task, i.e., speech-to-image generation (S2IG) framework which ...