Kitchen Experiment — Alvin Leung

Amidst the sizzle and sweat, grime-coated hands render the use of touch devices not only inconvenient but distinctly uncomfortable. This prototype-driven project proposes a multimodal interface for recipe navigation in the kitchen.

Team

Supervisor

Dr. Carman Neustaedter

Collaborator

Ethan Ma

Videographer

Chris Thomas

A sticky situation

Slimy raw meat, dusty flour, and a million other things in the kitchen make you think twice before touching your devices. Surveying HCI literature, we found promising ideas like on-skin gesture control but it requires custom hardware, while recipe apps on the market often neglect the during-cooking phase.

Here, Ethan and I saw the opportunity to challenge the dominant touch paradigm of modern mobile devices.

A touchless cooking companion

Our reimagined cooking experience integrates novel alternative interactions, allowing the user to focus on creating delicious meals, mess-free.

Hold: play and pause

By holding their hands in front of their device, users can toggle the play state.

Hold & lean: scrub

Users can scrub through the video by tilting their hand to one side, similar to using arrow keys, after activating control mode.

Voice query: hands-free recipe lookup

When user hands are occupied, voice queries enable users to handle cooking tasks and look up recipes hands-free.

4
Mentioned measurements
A common question during recipe lookup is "how much". Measurement is highlighted when voice answers mentioned an ingredient.

5
Follow-up actions
Follow-up actions allow users to start a timer or pin a response with just their voice.

Ethnography study (definitely not a hoax to get participants to cook for us)

To gain a more robust understanding of digital recipes usage during cooking, we conducted contextual inquiries with 5 home cooks of varying skill levels.

While users generally managed to complete tasks with workarounds, their experience was suboptimal. Here are our observations:

Observation: elbow interaction

Elbow interaction forces users into an awkward position, often leading them to accidentally move the devices into an upright position.

Observation: knuckle interaction

While affording higher precision, the digital device is prone to contamination due to its proximity to dirty fingers.

Through affinity diagramming, we also uncovered these key insights:

1
Visual aids matter: Participants stressed the value of visual instructions. Videos and images help home cooks match their actions intuitively.
2
Keep design interventions minimal: distractions from recipe lookups can be mentally taxing. Lightweight solutions help users stay focused on cooking.
3
Participants move around the kitchen: Users move around the kitchen, often without direct sightlines to digital devices, emphasizing the need for accessible, hands-free solutions.

These insights guided our exploration. After generating ideas, we pursued a multi-modal approach, focusing on these primary investigation areas:

1
Video as primary recipe medium
2
Camera feed-based air gestures to substitute touch interaction
3
Voice query to enable omni-directional interaction

Capturing the input

Leveraging iOS TrueDepth Camera, we can detect gesture distances and prevent unwanted triggers.

To design with this sensor, I developed a technique to process depth sensor data, capturing and smoothing user movements using Swift. The resulted platform allowed quick testing of various gesture approaches.

Uncovering design principles

Using the research-through-design methodology, designers develop understanding for the problem space through tangible prototypes.

In early experiments, we explored an air gesture pointer system but found it added too much cognitive overhead. Given the user's limited mental resource during cooking, this wasn't ideal. So, we limited gestures to essential video interactions, trading off flexibility for usability.

Nonetheless the we learned quite a few things from the prototype:

We consolidated our learnings into the following principles:

1
Use natural spatial mapping: when gesturing left, the visuals should move left too. It would be confusing if they moved in the opposite direction.
2
Appeal to the user's existing mental model: for video scrubbing, people associate left with backward and right with forward.
3
Continuous, instant visual feedback is crucial. Gestures must always respond to the user's input, even if they change their mind mid-action. This builds trust in the interface, allowing users to interact without hesitation and encouraging exploration of new interactions.
4
Design for imprecision to lighten cognitive load. In our studies, users find larger interface elements easier because they encourage broader, less precise gestures.

Perfecting the Scrub

To refocus our exploration, we have decided to tackle the more complex scrubbing action first.

Attempt 1: wave to scrub

That lead us to experimenting with air gesture based wave to scrub” interaction.

While intuitive, the scrubbing system suffers a few drawbacks that makes it particularly tricky to move forward with.

1
Entering and exiting the motion may accidentally trigger scrubbing, which is difficult to fix without sacrificing responsiveness.
2
This interaction also introduces a very awkward in-between-swipe state of users have to lift their hands off the detection zone to restart swiping.

Attempt 2: Dial to scrub

This iteration used a circular scrubbing motion, mimicking knob-turning.

The circular motion solves the in-between-swipe motion. Yet, aside from unresolved rejection on enter/exit motion, it also suffer from ergonomic issues. Users find it physically fatiguing to use this gesture for a prolonged period of time because of the fine motor control involved.

3rd times the charm: Lean to scrub

We were initially stuck, but then we recalled an interesting behavior from our study: a participant navigated quickly through a YouTube video by holding down the arrow keys. This observation inspired us to explore the concept of using “leaning” to capture a user’s intent when scrubbing through a video.

This was ultimately chosen to be included in the final solution because:

1
It is easy to perform for the user, as they only need to maintain a static position. This requires less motor effort compared to the constant movement needed for other solution.
2
With the additional hold state, it seemlessly incorporate the play, pause state into the design.
3
It still provides continuous feedback with subtle hints that acknowledge the user’s input.

“The interaction—of pausing it, moving forward, backwards— is actually going really, really well. I am very impressed with that, very nicely done.”

Dr. Carman Neustaedter

Supervisor

Takeaway: in praise of the barely working prototype

We kept delaying testing sessions, feeling we weren’t “ready.” It wasn’t until we set a hard deadline of “no prototype, no dinner” that we forced ourselves to test our design. Surprisingly, the prototypes we thought weren’t ready turned out to be incredibly valuable for learning.

9
One wonderful moment was when we hacked together a ChatGPT voice interaction client and opened a YouTube tab next to it. Despite its rough form, this prototype gave us invaluable insights into how our solution could function.

Takeaway: ideas happens when you make things

During the process of prototyping the air gesture, we explored various techniques for detecting hand gestures. As we experimented with different methods, we discovered new ways to detect user input and potential possibilities for interaction emerged.

Because we developed the dynamic bounding box technique, we were able to to explore a circular air-gestures.

Takeaway: testing just needs to make sense

Formative qualitative testing and feedback don't have to be overly complex or formal to be valuable; they just need to make sense. In our case, we frequently tested our idea with friends and family. By observing how they interacted with our prototype and interpreting their reactions, we were able to catch a lot of issues in the design. When approaching with caution, guerrilla testing on the go is a powerful tool.

Appendix: a fascinated dad

My dad tried to scrub video with his head—slightly less violant than pointing a knife at the prototype(yes, someone did try the interaction with a knife)

When touch is out of reach