2024: the year of multimodal AI
Retroactive Grade (February 2025): Correct.
I called 2023 the year of natural language interfaces. I was wrong — it was bigger. ChatGPT spawned thousands of copycats and specialized tools. The tech world transformed in 13 months.
Today, I saw something bigger coming: AI that works with images, video, and sound as naturally as it does with text.
I tested the early versions: showed an AI my fridge and a recipe (it told me what to buy), gave it rough wireframes (it drew a data model), photographed a French menu with my allergies (it found safe dishes).
I predict that by mid-2024, every developer would have access to these capabilities. They’d build tools that look at photos for renovation ideas, watch construction sites for safety violations, turn sketches into architectural plans, analyze image libraries for patterns.
Remember how ChatGPT changed writing and coding? This shift will be broader. Architects, designers, doctors, and builders will all have AI assistants that can see and hear the world as they do.
We’re moving from reading and writing with AI to showing and seeing. The keyboard was just the beginning.