Artificial intelligence put to the test: How well does Midjourney cope with language and cultural differences?
In the world of artificial intelligence (AI), I keep coming across amazing developments that impact our daily communication and interaction. One such innovation is Midjourney, an AI-powered tool that can generate images based on text prompts. In this blog post, I want to explore whether Midjourney understands other languages and whether there are differences in the generated images depending on the language used. I chose DeepL as the translation tool for this comparison, as I think it is the best translator available.
Multilingualism at Midjourney
First of all, I would like to emphasize that Midjourney does indeed understand multiple languages. The AI developers have designed the tool to be able to recognize text prompts in different languages and respond accordingly. The question I have is whether the generated images turn out differently when using different languages.
To answer this question, I ran a test (with Version 4) where I entered text prompts in different languages. As an example, I took the simple English sentence "a beautiful sunset over the ocean" and translated that sentence into several languages using DeepL. Here are four of the translated sentences:
German: "ein wunderschöner Sonnenuntergang über dem Ozean"
Spanish: "una hermosa puesta de sol sobre el océano"
French: "un magnifique coucher de soleil sur l'océan"
Chinese: "美丽的海上日落"
Japanese: "海に沈む美しい夕日"
Comparison of the generated images
I entered the various translated text prompts into Midjourney to see what images were generated. The results were amazing. Although there were quite a few similarities in the images, they also showed differences, probably due to cultural and language nuances.
Very interesting to see how the AI interprets the cultural and linguistic differences between the different languages and reflects them in the generated images.
One interesting aspect I found is that the images generated are also somewhat dependent on the quality of the translation. In my test, I found that DeepL provides fairly accurate translations, resulting in similar images in different languages.
Language differences of Midjourney from left to right: French, Spanish, Japanese, Chinese (grid of 4)
The quality of translation plays a crucial role in generating images in Midjourney, as it relies on accurately capturing the meaning of text prompts in different languages.
Complex Text Prompts: Midjourney and the Challenge of Multilayered Scenarios
In order to better analyze and understand Midjourney's different interpretations of complex text prompts, I decided to perform another comparison with a more challenging scenario. For this, I chose a detailed and multifaceted text prompt that combines different elements and thus poses a greater challenge for the AI:"full close up of a daisy in extreme love with detail, covered with water drops, clean white petals, well-lit, photo realistic, vibrant muted color palette, high color contrast, intricate details, black background, avarice photovoque still life, Luxury photo, Elena Korshak style, Maria Sibylla Merian style painting, macro photo, National Geographic photo, extremely detailed background, 8k, photorealistic resolution, hdr, high octane rendering Cinematic lighting, ultra high definition, artstation, Smooth, sharp focus, Photorealism, 8k, Full HD, 3d, unreal engine, hyperreal, surreal art, digital art, world made by light, soft lighting, dynamic composition, 8k, photorealistic resolution, hdr, high octane rendering Cinematic lighting, Realistic Detail, Depth of field, 8k, Full HD, 3d, Super resolution, octane render, award winning photo, shot on Canon DSLR, f/2.8 Long exposure, 25mm --ar 4:5"
Here are the results:
From left to right: English, Dutch, Greek,Turkish (grid of 4)
Looking at the results now, the images generated with the English and Dutch prompt are relatively similar and correspond to an assumed result due to the prompt to Midjourney. On the other hand, the Turkish and Greek results are completely different and have nothing in common with the original prompt. Thus, they do not correspond to the expected result, but rather remind us of newspaper clippings, media reports or social media posts.
Is it the translation or Midjourney that the Greek and Turkish results have nothing at all to do with the original prompt?
Now the question arises: Is it due to the translation or to Midjourney that the Greek and Turkish results have nothing at all to do with the original prompt?
To answer this question, it would be important to check the quality of the translations and take into account possible linguistic or cultural differences that could play a role in the interpretation of the text prompt. One possibility would be to compare the translations of Turkish and Greek with other translation services or to have them checked by native speakers.
Another factor could be that Midjourney may have difficulty grasping the meaning of the complex text prompt in certain languages. It is conceivable that for some languages the AI is better able to interpret complex scenarios and generate appropriate images, while for other languages it reaches its limits.
Conclusion
Overall, my research shows that Midjourney is capable of understanding and responding to different languages. However, there are differences in the images generated, possibly due to cultural and linguistic nuances. The quality of translation also plays a role, however, as the AI relies on accurately capturing the meaning of text prompts in different languages.
It might be interesting to focus on the language in which one has the largest vocabulary to achieve the desired results. Alternatively, one could choose the language in which the generated images are closest to the desired end result. For the latter, however, it would be necessary to run multiple tests for different genres in order to draw informed conclusions.
Midjourney's potential to map cultural differences in images is interesting, but it remains to be seen how well this AI can handle increasingly complex linguistic and cultural challenges in the future.
Comments