
Immerzen, create your AI-generated oasis of calm in XR
How it started
When I tried VR for the first time, I simply created a file in Unity, added a sky material, and put on the VR headset.
​
However, the sky looked so spectacular and for a second I thought I felt the warmth of the sun shining down on me. The world seemingly expands to infinity instantly lifted my spirits
​
and I just sat there, captivated, for a long time.
Reflecting on this experience later, I realized that for busy city dwellers with scarce personal time and limited nature exposure, a virtual retreat can provide solace, relieve anxiety and restore inner peace.
Possibility unlocked by generative AI
Prior to generative AI, constructing 3D environments relied heavily on 3D artists. While customization was possible to a certain extend, such as choosing and placing furniture in The Sims, the use of pre-made 3D models and 2D textures imposed limitations on the diversity of worlds users could create.
With generative AI, it is now possible for users to create the space that visualizes their own definition of a meditative space.
Design process

With the unveiling of Apple's Vision Pro during the development, this project takes particular interest in UI prototyping and testing for spatial computing.
1 Competitor & Feature Analysis

2.1 Define needs and functions
After work, instead of meditating, I relax by watching YouTube, listening to music, and editing vacation photos.
Different people find relaxation in various ways. Just like meditation, Immerzen can be a new tool to help immerse oneself in moments of peace.
​
Using the well-received features from competitor analysis as guide, Immerzen follows three rules:
1. Keep functions as simple as possible.
2. Include music.
3. Generation based on user's feeling/mood.
​
The list of function:
​
- Environment (HDRI) generation
- Mood picker
- Music generation based on the generated environment and user's mood
- Save environment and music to collection

2.2 User journey
Ver. 1
.png)
After an AI-generated music sample user test, many testers expressed their dislike for the music. This highlighted the fact that people have distinct preferences when it comes to music.
To address this, the second version incorporated the embedding of streaming service such as Apple Music to guide the music generation to match each individual's preferences.
Ver. 2

3. UI for spatial computing, insights from experiments
Insight 1: Think 3D and transparent on spacing
In XR, the FOV (Field of View) poses a challenge with traditional 2D spacing as it spreads the UI and makes it difficult to reach interactables at the panel's edge.
However, applying the same spacing in depth resolves this issue by creating separation between interactables while keeping the interface compact for easy accessibility with minimal head and hand movement.
Transparent layers are crucial to leverage depth layering. The top layer should prioritize the most important function, standing out as the visual center when made opaque.

UI with traditional 2D spacing

UI with transparency and layering
Insight 2: focus on the 60° golden FOV
Before developing in XR, I imagined the XR interface to be all over the place and I would move from one spot to another. After a month, I imagined tabs surrounding me in a 360° fashion, causing me to constantly rotate, perhaps on a swivel chair.
However, after 6 months of prototyping and testing, it became evident that the majority of interfaces should remain within the user's 60° FOV, positioned directly in front of them.
​
The initial observation from a user test is that people prefer minimal movement. Even holding a controller becomes tiring within minutes, let alone turning their heads or walking. Studies, like the one conducted by João Belo et al., support this finding and even propose a tool to measure the interaction cost.

source: dl.acm.org/doi/fullHtml/10.1145/3411764.3445349
4. 1 Environment AI Model Training
4.1.1 Experiment with untrained AI
An untrained AI doesn't understand what an HDRI image is and how it should look.
An HDRI is a 360-degree image that captures all environmental data, including lighting. However, Midjourney mistakes the texture sphere for the meaning of HDRI, making the generated images unusable.
​
Even using a real HDRI image as an embedding did not improve the result.

Generated with Midjourney text prompt:
an hdri texture from hdri haven, urban street --ar 2:1 --v 5.1

Generated with Midjourney image embedding and text prompt:
an hdri texture from hdri haven, urban street --ar 2:1 --v 5.1

Generated with Stable Diffusion txt to img + ControlNet:
an hdri texture from hdri haven, urban street
Using ControlNet can produce usable HDRI images, but always too similar to the ControlNet embedding, even when in reference only mode.

ControlNet embedding
4.1.2 Train HDRI Stable Diffusion Model
After the initial tryouts with Midjourney and Stable Diffusion, it became clear that training an HDRI AI model is necessary for generating high-quality environment in XR.
​
The first model was trained with 70 HDRIs downloaded from HDRI Haven using LoRA.
​
Using the first model, more HDRIs were generated and put into the dataset for the second model.
​
The resulting second model worked quite well for generating outdoor HDRIs.

The trained model OutdoorHDRI
4.2 Image to Music
*Volume can be loud
Environment and music mapping test in Unreal Engine 5, Meta Quest Pro
Testing involved generating music from two AI-generated HDRIs using the 'Image to Music' API by fffiloni on Hugging Face.
The process utilizes CLIP Interrogator to caption the image and then employs Mubert to generate music based on the caption text.
Although the resulting music loop was satisfying, the time it took was considerable and requires future improvements.
4.3 Create Zen
Hours of exploration in various environments and listening to the generated music revealed that something was still missing.
True meditative experiences involve a focal point, such as a crackling campfire or a tranquil waterfall. The juxtaposition of a dynamic element against a serene backdrop captivates the mind.
Inspired by this notion, a particle object that visualizes music beats was brought to life using Niagara.
First prototype of zen focus artifact, Unreal Engine 5.2, Meta Quest 2
5 Iteration
Ongoing