More thoughts on blending Voice and Text input

After my last post on writing docstring and code simultaneously, a friend¹ told to consider different human languages since there are some research on attention being language local. I have not tried this yet since I am mostly an English speaker when working on my computer. Another comment from HN said this:

Thats a cool idea. Could the LLM find the right location for the audio stream by simply having the context of the buffer, and the location of the text and audio cursor when the intersction starts?

While I have not gotten time to think about working with prose, I have some thoughts about programming with this. Ergonomically, I think keyboard is good for lower and voice for higher level of abstractions.

Examples of higher level operations would be refactoring, scaffolding, moving modules, replicating templates, writing stubs, etc. These are all tasks that you can describe to someone using hand-wavy utterances without worrying too much about details. You probably also don't want to worry about them and let IDE automation or macros take care of these. Lower level operations would involve programming with attention at places where your eyes and cursor need to follow each other for some dedicated amount of time.

The question now is whether I can do both of these in parallel. Based on how my stream of consciousness runs while programming, I believe there are interstices where I could speak and type at the same time for these use cases. The exact experience and its effectiveness will need implementation and a few experiments.

Footnotes:

Manas