In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality features, ResNeXt and text features generated by DenseCap. We propose the 3M model, Multi-UPDOWN caption that encodes decodes them into captions. demonstrate effectiveness of our on generating human-like captions examining its performance two datasets, PERSONALITY-CAPTIONS dataset, Flic...