top of page
main shot_OctaneCamera_3_a_i2_i2.png

Immerzen, create your AI-generated oasis of calm in XR

How it started

When I tried VR for the first time, I simply created a file in Unity, added a sky material, and put on the VR headset.

However, the sky looked so spectacular and for a second I thought I felt the warmth of the sun shining down on me. The world seemingly expands to infinity instantly lifted my spirits 

and I just sat there, captivated, for a long time.

 

Reflecting on this experience later, I realized that for busy city dwellers with scarce personal time and limited nature exposure, a virtual retreat can provide solace, relieve anxiety and restore inner peace. 

Possibility unlocked by generative AI

Prior to generative AI, constructing 3D environments relied heavily on 3D artists. While customization was possible to a certain extend, such as choosing and placing furniture in The Sims, the use of pre-made 3D models and 2D textures imposed limitations on the diversity of worlds users could create.

With generative AI, it is now possible for users to create the space that visualizes their own definition of a meditative space. 

Design process

design method 3.jpg

With the unveiling of Apple's Vision Pro during the development, this project takes particular interest in UI prototyping and testing for spatial computing.

1 Competitor & Feature Analysis

visuals.jpg

2.1 Define needs and functions

After work, instead of meditating, I relax by watching YouTube, listening to music, and editing vacation photos.

 

Different people find relaxation in various ways. Just like meditation, Immerzen can be a new tool to help immerse oneself in moments of peace.

Using the well-received features from competitor analysis as guide, Immerzen follows three rules:

1. Keep functions as simple as possible.

2. Include music.

3. Generation based on user's feeling/mood.

The list of function:

- Environment (HDRI) generation

- Mood picker

- Music generation based on the generated environment and user's mood

- Save environment and music to collection

visuals.png

2.2 User journey

Ver. 1

spacial computing UI for immerzen (1).png

After an AI-generated music sample user test, many testers expressed their dislike for the music. This highlighted the fact that people have distinct preferences when it comes to music.

 

To address this, the second version incorporated the embedding of streaming service such as Apple Music to guide the music generation to match each individual's preferences.

Ver. 2

user journey 1.jpg

3. UI for spatial computing, insights from experiments

Insight 1: Think 3D and transparent on spacing

In XR, the FOV (Field of View) poses a challenge with traditional 2D spacing as it spreads the UI and makes it difficult to reach interactables at the panel's edge.

 

However, applying the same spacing in depth resolves this issue by creating separation between interactables while keeping the interface compact for easy accessibility with minimal head and hand movement.

 

Transparent layers are crucial to leverage depth layering. The top layer should prioritize the most important function, standing out as the visual center when made opaque.

ui2.png

UI with traditional 2D spacing

ui4.png

UI with transparency and layering

Insight 2:  focus on the 60° golden FOV

Before developing in XR, I imagined the XR interface to be all over the place and I would move from one spot to another. After a month, I imagined tabs surrounding me in a 360° fashion, causing me to constantly rotate, perhaps on a swivel chair.

 

However, after 6 months of prototyping and testing, it became evident that the majority of interfaces should remain within the user's 60° FOV, positioned directly in front of them.

The initial observation from a user test is that people prefer minimal movement. Even holding a controller becomes tiring within minutes, let alone turning their heads or walking. Studies, like the one conducted by João Belo et al., support this finding and even propose a tool to measure the interaction cost.

chi21-309-fig1.jpg

source: dl.acm.org/doi/fullHtml/10.1145/3411764.3445349

4. 1 Environment AI Model Training

4.1.1 Experiment with untrained AI

An untrained AI doesn't understand what an HDRI image is and how it should look.

 

An HDRI is a 360-degree image that captures all environmental data, including lighting. However, Midjourney mistakes the texture sphere for the meaning of HDRI, making the generated images unusable.

Even using a real HDRI image as an embedding did not improve the result.

an_hdri_texture_from_hdri_haven_urban_street.png

Generated with Midjourney text prompt:

an hdri texture from hdri haven, urban street --ar 2:1 --v 5.1

an_hdri_texture_from_hdri_haven_urban_street_img.png

Generated with Midjourney image embedding and text prompt:

an hdri texture from hdri haven, urban street --ar 2:1 --v 5.1

00003-1739178872.png

Generated with Stable Diffusion txt to img + ControlNet:

an hdri texture from hdri haven, urban street

Using ControlNet can produce usable HDRI images, but always too similar to the ControlNet embedding, even when in reference only mode.

modern_buildings_2_1k.png

ControlNet embedding

4.1.2 Train HDRI Stable Diffusion Model

After the initial tryouts with Midjourney and Stable Diffusion, it became clear that training an HDRI AI model is necessary for generating high-quality environment in XR.

The first model was trained with 70 HDRIs downloaded from HDRI Haven using LoRA.

Using the first model, more HDRIs were generated and put into the dataset for the second model.

The resulting second model worked quite well for generating outdoor HDRIs.

urban street sd screenshot.png

The trained model OutdoorHDRI

4.2 Image to Music

*Volume can be loud

Environment and music mapping test in Unreal Engine 5, Meta Quest Pro

Testing involved generating music from two AI-generated HDRIs using the 'Image to Music' API by fffiloni on Hugging Face.

 

The process utilizes CLIP Interrogator to caption the image and then employs Mubert to generate music based on the caption text.

 

Although the resulting music loop was satisfying, the time it took was considerable and requires future improvements.

4.3 Create Zen

Hours of exploration in various environments and listening to the generated music revealed that something was still missing.

 

True meditative experiences involve a focal point, such as a crackling campfire or a tranquil waterfall. The juxtaposition of a dynamic element against a serene backdrop captivates the mind.

 

Inspired by this notion, a particle object that visualizes music beats was brought to life using Niagara.

First prototype of zen focus artifact, Unreal Engine 5.2, Meta Quest 2

5 Iteration

Ongoing

bottom of page