Human–computer interfaces, a long sought-after goal, would open new worlds. Disabled people could regain autonomy, people could access information and operate seamlessly in a digital world. This goal is yet to be realized because training machines to follow our mental commands, such as move a cursor across the screen, is a complicated and tedious process. Now, by approaching the problem of this machine learning from a brand-new angle, researchers from the University of Helsinki are drastically improving how we can interface with machines. Rather than teaching the computer to do something when we ask it, the machine is now capable of learning what we want it to do without being told.
In this proof-of-concept published in a paper at the Computer Vision and Pattern Recognition 2022 conference, a set of computer models learned, using brain signals alone, what feature of a picture a human was interested in and how to use that information to complete photo editing tasks. The technique holds promise for improved computer–brain interfaces but also reveals ethical conflicts society needs to prepare for.
Beyond explicit training, toward genuine learning
The process of using machine learning to edit images began with human volunteers viewing artificially generated images of human faces. Each participant was asked to perform tasks related to the image while their brain responses were recorded via electroencephalography (EEG). The tasks involved identifying a feature such as “smiling” or “old”. The EEG then recorded brain activity and the researchers looked for a characteristic spike in activity called P300 that occurs 300 milliseconds after a stimulus.
As Tuukka Ruotsalo, professor of computer science at the University of Helsinki and one of the authors of the paper, explained, “By using this [P300], we can then understand when something on the screen evoked a stronger effect than something else.”
The image a user saw and the P300 spike data are the input for two models: one which deciphers the brain signal and feature of interest and a second called a generative adversarial network (GAN), a powerful tool for generating unique images from a source database, a concept already used to combat scientific image fraud. Together, the two models learn to perform an editing task, based on what the user’s brain reacted too. When given a new image of a face it can then transform features like the smile or hair color.
The truly unique and exciting aspect of this method is that the machine is not trained explicitly to carry out the task. “The important thing is that the model itself doesn’t know anything about these tasks,” said Ruotsalo. The models learn, based on brain activity alone, what the task is. “These two models negotiate what it is that the humans react to and then they gain an understanding, in this case in the image space, to be able to do these transformations,” explained Ruotsalo.
Previous techniques train models to carry out specific functions like moving a cursor or to recognize features based on manually annotated databases. By allowing the computer to learn for itself based solely on the brain’s natural reaction vastly improves the adaptability and potential application of the model to anything a human reacts to. “You could think of being able to pick up certain words or music features, or anything,” said Ruotsalo.
A two-way street
While we can learn to apply this technique to a variety of applications, at the same time the machine is learning a great deal about us, which raises some important ethical considerations.
“I think we have to be very careful on bringing in these new signals to applications where they might be misused,” said Ruotsalo. Online life is already heavily monitored and adding more data, such as brain activity, to the ever-growing database of behaviors we exhibit online further removes our privacy if we let it. “I think it’s a broader discussion of how we allow and what we consent to be done with these signals that can be recorded from us,” added Ruotsalo.
The natural responses of our brain to stimuli we deem relevant or interesting is a powerful signal to harness. “We maybe wouldn’t like these signals to be used for advertising or other things,” said Ruotsalo. “I don’t want to see a world like that.”
The goal of this work, though, was a demonstration of both the potential and pitfalls. “We really want to demonstrate what’s possible, but at the same time, raise awareness that this technology is there,” he said.
This means thinking about the policies to curb misuse. “We, as academics, should explore the possibilities, but at the same time demonstrate that this can be done so it also calls for policies and ethical guidelines on how it can be used,” said Ruotsalo.
Reference; T. Ruotsalo et al. ‘Brain-Supervized Image Editing‘ CVPR (2022)