Eye wide open

01 Jul.,2024

 

Eye wide open

Eye wide open

Multi-genre artist Mr Eyeball aims to help people answer their everyday existential dilemmas

  • By Catherine Shu / STAFF REPORTER

    If you are looking for more details, kindly visit our website.

VIEW THIS PAGE

VIEW THIS PAGE

Mr Eyeball casts a steady, unwavering gaze on the human condition &#; and not just because he has no eyelids.

In the past eight years the prolific artist has worked in multiple arenas &#; including directing, choreography, writing, singing, acting, illustration and fine art &#; and performed in England, Japan, the US and China. Mr Eyeball&#;s resume includes a one-man (or one-eye) exhibition at the Museum of Contemporary Art Taipei (&#;&#;&#;&#;&#;&#;&#;), crossover projects with Converse and Swatch and a line of T-shirts, tote bags and toys that are available at Red House Theater (&#;&#;&#;&#;). His published work ranges from art books to an illustrated series called Xiang Tai Duo (&#;&#;&#;, or &#;think too much&#;), which is regularly excerpted in Apple Daily. In his spare time, Mr Eyeball serves as stylist to the stars; pop singers Big S (&#;S), otherwise known as Barbie Hsu (&#;&#;&#;), and Little S (&#;S), otherwise known as Dee Hsu (&#;&#;&#;), and Ricky Hsiao (&#;&#;&#;) have worn his outrageous creations in performances or on the red carpet.

The anthropomorphized organ is the brainchild of Chen Po-wei (&#;&#;&#;), a former theater set and costume designer who launched the Mr Eyeball brand in .

Mr Eyeball&#;s art has shifted along with Chen&#;s interests and target audience. In the beginning, Chen says, his approach was much darker and Mr Eyeball worked primarily in performance art, drawing on Chen&#;s theatrical background. His first book, Eyeball Loves the Globe, was filled with photographs of dark scenarios that looked like Hieronymus Bosch-Salvador Dali-Cindy Sherman mash-ups.

But Mr Eyeball has since lightened up. The Xiang Tai Duo series has brightly colored illustrations of children romping in animal costumes; Mr Eyeball now gears much of his work toward a younger audience, appearing at comic conventions and doing outreach work at elementary schools impacted by Typhoon Morakot.

Mr Eyeball&#;s artistic output, however, continues to explore the same themes. The kids in the Xiang Tai Duo series pose questions like &#;what is the meaning of existence?&#; to readers. On Mr Eyeball&#;s latest album, This World (&#;&#;&#;&#;), he sings about the transcendence of happiness. In the end, Mr Eyeball just wants people to turn their gaze inwards, says Chen, and contemplate life&#;s little existentialist questions.

Taipei Times: Mr Eyeball can come across as a little scary and a lot of your past work has mixed cuteness with dark elements, but ultimately it seems like he has a very idealistic, upbeat approach to life.

Chen Po-wei: When I first started out, my style was a lot more direct, but the message was the same, that life can be happy and colorful, but at the same time is often difficult and filled with sorrow. On my first album cover there was a drawing of Mr Eyeball looking cheerful and happy, but in the back illustration he&#;s chopped his arms off. The songs on that record were like that: half were happy and half were darker. The message was that sadness doesn&#;t mean that good times won&#;t come again and happiness doesn&#;t mean everything will always work out.

The new album, This World, is different in tone but it covers the same themes. There is a photo of Mr Eyeball on the front and of me without the mask on the back, but most people don&#;t know it&#;s me, because I don&#;t go out in public often as myself. But this picture of me looks a little blue, both literally in the color and in the feeling it portrays. I think that&#;s more like how I am in private, because I&#;m more introverted. The front cover, however, is when I wear the Mr Eyeball mask and become this character. I&#;m livelier and more energetic. This World is subtler than my earlier work, but the message is always that no matter how you feel at the moment, happiness and sadness are both part of the same universe and you have to face it.

TT: What inspired you to make an eyeball into a character?

CP: I&#;ve always liked the art of the Surrealists, especially Dali. A lot of their art used body parts like eyes or lips to symbolize different concepts. I really liked that element of fantasy and as someone who was a little shy and quiet, I felt attracted to art that could express multiple meanings.

When I created my brand, I had to think of a logo that would express what I was trying to do with it. I thought, I already use a lot of eyeballs in my art and they are very flexible artistically. They connote many different things and that was useful in the beginning, when my work was more abstract.

There&#;s a Chinese saying that if you close one eye, you will see things more clearly. I thought this saying is a bit narrow because I think most people are actually too focused on one thing and they don&#;t want to see things in context. We have tunnel vision, because we have goals like wanting to own a home in 10 years or attracting someone we like. But even if you work really hard on something you aren&#;t guaranteed to get it. So I think the message of the saying should be that people have to keep one eye on the world and close one eye to look inwards if they want to be truly fulfilled.

TT: Your series, Xiang Tai Duo, is a lot warmer and gentler in feel than your earlier books and performance art, but it continues to cover the same themes.

CP: The illustrations are sweet, but I write the books for adults, too, not just kids. When you look at the text, it&#;s not as simple as you&#;d assume. I think people who have had more life experience will get more out of the books. It&#;s like when a cartoon character has an angel and a devil on his shoulders, prompting him and pulling him in two directions. The books are meant to be kind of like that. We all have voices inside of us, one telling us we should be happier and the other asking, if I&#;m not happy, then what do I need to do to be more content?

The books also try to get the point across that everyone thinks about these things. [Entrepreneur and billionaire] Terry Guo (&#;&#;&#;) and Jay Chou (&#;&#;&#;) also deal with these issues. Sometimes people wonder, &#;Am I the only one who feels this way? Where is the meaning in my life? What are my passions?&#;

TT: I see this phrase a lot in your work: &#;I am a human being, I am also an alien&#; (&#;&#;&#;&#;&#;&#;&#;&#;&#;&#;&#;). What does that mean?

CP: When people talk about aliens, no one is sure what they look like and we make up our own fantasies of what they are. A lot of times aliens are pictured as being spooky, like ghosts &#; but of course we also don&#;t know what ghosts look like.

So the meaning of that sentence is that &#;there are times when I&#;m like you and there are also times when I am also completely unlike you.&#; Sometimes people feel a deep sense of kinship because they have a few things in common, but as soon as they discover a difference, they suddenly feel completely alienated from one another. That&#;s why there are so many religious conflicts, because it&#;s hard to reconcile spiritual differences. Or you&#;re black, I&#;m white; you&#;re from the West, I&#;m from Asia; we work for competing companies ... all these differences can make people feel like they come from different planets.

The point I&#;m trying to make is that ultimately we&#;re all the same. We&#;re all human beings. You don&#;t have to split people up into groups. Sometimes people in Taiwan say Aboriginal people are lazy. Or when Asian people travel abroad, sometimes they feel threatened when they see black people. People like slapping labels on one another. But the point is that we have more in common than not. Just because we have differences doesn&#;t mean that we can&#;t communicate &#; and just because we have things in common doesn&#;t mean we&#;ll get along.

TT: You&#;ve done more performances for younger audiences recently. How do kids react to Mr Eyeball? Are some of them freaked out?

CP: No, actually, and that&#;s partly because the eyeball mask has changed. At first it was designed to look like a real eyeball, with blood vessels, so it was a lot scarier. Now it&#;s like a cartoon, it even has rosy cheeks. Mr Eyeball&#;s movements have also changed. At first when I wore the Mr Eyeball mask, I wasn&#;t really into accompanying it with cute movements. I wore things like business suits to go along with it. But last year we were at a comic convention, and there I wore a suit made out of children&#;s fabric, with cartoon characters all over it. So we&#;ve definitely changed and we&#;ve started to reach out to kids and teenagers.

Also, it has to do with a change in my own interests. At first I wanted Mr Eyeball to be a cool character, but when you work with children and you leap into a room and say, &#;hi kids, how are you?&#; you instantly feel younger, too. I wanted to be different and cool, but now because of this change in direction I think it&#;s easier for a general audience to accept Mr Eyeball and also for kids not to be scared. Maybe they think, &#;you look weird, but you can still play with us and make us laugh.&#;

A few years ago I went to England to perform at an event and a little boy asked to take his photo with me. Afterward, his mom told me that this was probably only the third time he&#;d ever asked to take a photo with someone, because he&#;s very shy, so she was very surprised. And I thought, I have no idea what&#;s going on in that kid&#;s head, but I can see that taking a photo with me is something he wants to do. There&#;s a Chinese saying that your outer appearance is an extension of how you feel on the inside. It wasn&#;t my intention at first, but now that we work with kids more, I&#;ve started to do things that I think they will find interesting and fun, so even if they don&#;t know who Mr Eyeball is, they&#;ll still think he&#;s cute. I&#;ve never met a child who is scared of Mr Eyeball.

XIANGTAI contains other products and information you need, so please check it out.

On the Net: www.eyeball.com.tw

VIEW THIS PAGE

VIEW THIS PAGE

MotionBooth: Motion-Aware Customized Text-to-Video ...

Overall pipeline. The overall pipeline of MotionBooth is illustrated in Fig. 2 . During the training stage, MotionBooth learns the appearance of the given subject by fine-tuning the T2V model. To prevent overfitting, we introduce video preservation loss and subject region loss in Section 3.2 . Additionally, we propose a subject token cross-attention (STCA) loss in Fig. 4 to explicitly connect the subject tokens with the subject&#;s position on cross-attention maps, facilitating the control of subject motion. Camera and subject motion control are performed during the inference stage. We manipulate the cross-attention maps by amplifying the subject tokens and their corresponding regions while suppressing other tokens in Section 3.3 . This ensures that the generated subjects appear in the desired positions. By training on the cross-attention map, the STCA loss enhances the subjects&#; motion control. For camera movement, we introduce a novel latent shift module to shift the noised latent directly, achieving smooth camera movement in the generated videos in Section 3.4 .

Task formulation. We focus on generating motion-aware videos featured by a customized subject. To customize video subjects, we fine-tune the T2V model on a specific subject. This process can be accomplished with just a few (typically 3-5) images of the same subject. During inference, the fine-tuned model generates motion-aware videos of the subject. The motion encompasses both camera and subject movements, which are freely defined by the user. For camera motion, the user inputs the horizontal and vertical camera movement ratios, denoted as &#; c &#; a &#; m = [ c x , c y ] subscript &#; &#; &#; &#; subscript &#; &#; subscript &#; &#; \mathbf{c}_{cam}=[c_{x},c_{y}] bold_c start_POSTSUBSCRIPT italic_c italic_a italic_m end_POSTSUBSCRIPT = [ italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] . For subject motion, the user provides a bounding box sequence [ &#; 1 , &#; 2 , &#; , &#; L ] subscript &#; 1 subscript &#; 2 &#; subscript &#; &#; [\mathbf{B}_{1},\mathbf{B}_{2},...,\mathbf{B}_{L}] [ bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , &#; , bold_B start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ] to indicate the desired positions of the subject, where L &#; L italic_L represents the video length. Each bounding box specifies the x-y coordinates of the top-left and bottom-right points for each frame. By incorporating these conditional inputs, the model is expected to generate videos that include a specific subject, along with predefined camera movements and subject motions.

3.2

Subject Learning

Figure 3

:

Case study on subject learning. &#;Region&#; indicates subject region loss. &#;Video&#; indicates video preservation loss. The images are extracted from generated videos.

Given a few images of a subject, previous works have demonstrated that fine-tuning a diffusion model on these images can effectively learn the appearance of the subject [39, 23, 8, 10, 40, 44]. However, two significant challenges remain. First, due to the limited size of the dataset, the model quickly overfits the input images, including their backgrounds, within a few steps. This overfitting of the background impedes the generation of videos with diverse scenes, a problem also noted in previous works [39, 12]. Second, fine-tuning T2V models using images can impair the model&#;s inherent ability to generate videos, leading to severe background degradation in the generated videos. To illustrate these issues, we conducted a toy experiment. As depicted in Fig. 3, without any modifications, the model overfits the background to the subject image. To address this, we propose computing the diffusion reconstruction loss solely within the subject region. However, even with this adjustment, the background in the generated videos remains over-smoothed. This degradation likely results from tuning a T2V model exclusively with images, which damages the model&#;s original weights for video generation. To mitigate this, we propose incorporating video data as preservation data during the training process. Although training with video data but without subject region loss still suffers from overfitting, our approach, MotionBooth, can generate videos with detailed and diverse backgrounds. (xiangtai: hard to understand.)

Preliminary. T2V diffusion models learn to generate videos by reconstructing noise in a latent space [39, 29, 51, 12]. The input video is first encoded into a latent representation &#;0subscript&#;0\mathbf{z}_{0}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Noise &#;italic-&#;\epsilonitalic_&#; is added to this latent representation, resulting in a noised latent &#;tsubscript&#;&#;\mathbf{z}_{t}bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where t&#;titalic_t represents the timestamp. This process simulates the reverse process of a fixed-length Markov Chain [38]. The diffusion model &#;θsubscriptitalic-&#;&#;\mathbf{\epsilon}_{\theta}italic_&#; start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is trained to predict this noise. The training loss, which is a reconstruction loss, is given by:

&#;=&#;&#;,&#;&#;&#;&#;(&#;,&#;),t,&#;&#;[&#;&#;&#;&#;θ&#;(&#;t,&#;,t)&#;22],&#;subscript&#;formulae-sequencesimilar-to&#;italic-&#;&#;0&#;&#;&#;delimited-[]subscriptsuperscriptnormitalic-&#;subscriptitalic-&#;&#;subscript&#;&#;&#;&#;22\mathcal{L}=\mathbb{E}_{\mathbf{z},\mathbf{\epsilon}\sim\mathcal{N}(\mathbf{0}% ,\mathbf{I}),t,\mathbf{c}}\left[||\mathbf{\epsilon}-\mathbf{\epsilon}_{\theta}% (\mathbf{z}_{t},\mathbf{c},t)||^{2}_{2}\right],caligraphic_L = blackboard_E start_POSTSUBSCRIPT bold_z , italic_&#; &#; caligraphic_N ( bold_0 , bold_I ) , italic_t , bold_c end_POSTSUBSCRIPT [ | | italic_&#; - italic_&#; start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_c , italic_t ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ,

(1)

where &#;&#;\mathbf{c}bold_c is the conditional input used in classifier-free guidance methods, which can be text or a reference image. During inference, a pure noise &#;Tsubscript&#;&#;\mathbf{z}_{T}bold_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is gradually denoised to a clean latent &#;0&#;subscriptsuperscript&#;&#;0\mathbf{z}^{\prime}_{0}bold_z start_POSTSUPERSCRIPT &#; end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, where T&#;Titalic_T is the length of the Markov Chain. The clean latent is then decoded back into RGB space to generate the video &#;&#;superscript&#;&#;\mathbf{X}^{\prime}bold_X start_POSTSUPERSCRIPT &#; end_POSTSUPERSCRIPT.

Subject region loss. To address the challenge of overfitting backgrounds in training images, we propose a subject region loss. The core idea is to calculate the diffusion reconstruction loss exclusively within the subject region, thereby preventing the model from learning the background. Specifically, we first extract the subject mask for each image. This can be done manually or through automatic methods, such as a segmentation model. In practice, we use SAM [27] to collect all the masks. The subject region loss is then calculated as follows:

&#;s&#;u&#;b=&#;&#;,&#;&#;&#;&#;(&#;,&#;),t,&#;&#;[&#;(&#;&#;&#;θ&#;(&#;t,&#;i,t))&#;&#;&#;22],subscript&#;&#;&#;&#;subscript&#;formulae-sequencesimilar-to&#;italic-&#;&#;0&#;&#;&#;delimited-[]subscriptsuperscriptnorm&#;italic-&#;subscriptitalic-&#;&#;subscript&#;&#;subscript&#;&#;&#;&#;22\mathcal{L}_{sub}=\mathbb{E}_{\mathbf{z},\mathbf{\epsilon}\sim\mathcal{N}(% \mathbf{0},\mathbf{I}),t,\mathbf{c}}\left[||(\mathbf{\epsilon}-\mathbf{% \epsilon}_{\theta}(\mathbf{z}_{t},\mathbf{c}_{i},t))\cdot\mathbf{M}||^{2}_{2}% \right],caligraphic_L start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_z , italic_&#; &#; caligraphic_N ( bold_0 , bold_I ) , italic_t , bold_c end_POSTSUBSCRIPT [ | | ( italic_&#; - italic_&#; start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t ) ) &#; bold_M | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ,

(2)

where &#;&#;\mathbf{M}bold_M represents the binary masks for the training images. These masks are resized to the latent space to compute the dot product. &#;isubscript&#;&#;\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a fixed sentence in the format "a [V] [class name]," where "[V]" is a rare token and "[class name]" is the class name of the subject [39]. We have found that with the subject region loss, the trained model effectively avoids the background overfitting problem.

Video preservation loss. Image customization datasets like DreamBooth [39] and CustomDiffusion [29] provide excellent examples of multiple images from the same subject. However, in the customized video generation task, directly fine-tuning the video diffusion model on images leads to significant background degradation. Intuitively, this image-based training process may harm the original knowledge embedded in video diffusion models. To address this, we introduce a video preservation loss designed to maintain video generation knowledge by joint training with video data. Unlike the class-specific preservation data used in previous works [39, 51], we utilize common videos with captions denoted as &#;vsubscript&#;&#;\mathbf{c}_{v}bold_c start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT. Our experiments in Section 4 demonstrate that common videos are more effective for subject learning and preserving video generation capabilities. The loss function is formulated as follows:

&#;v&#;i&#;d=&#;&#;,&#;&#;&#;&#;(&#;,&#;),t,&#;&#;[&#;&#;&#;&#;θ&#;(&#;t,&#;v,t)&#;22].subscript&#;&#;&#;&#;subscript&#;formulae-sequencesimilar-to&#;italic-&#;&#;0&#;&#;&#;delimited-[]subscriptsuperscriptnormitalic-&#;subscriptitalic-&#;&#;subscript&#;&#;subscript&#;&#;&#;22\mathcal{L}_{vid}=\mathbb{E}_{\mathbf{z},\mathbf{\epsilon}\sim\mathcal{N}(% \mathbf{0},\mathbf{I}),t,\mathbf{c}}\left[||\mathbf{\epsilon}-\mathbf{\epsilon% }_{\theta}(\mathbf{z}_{t},\mathbf{c}_{v},t)||^{2}_{2}\right].caligraphic_L start_POSTSUBSCRIPT italic_v italic_i italic_d end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_z , italic_&#; &#; caligraphic_N ( bold_0 , bold_I ) , italic_t , bold_c end_POSTSUBSCRIPT [ | | italic_&#; - italic_&#; start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_t ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] .

(3)

Figure 4

:

Case study on subject token cross-attention maps. (b) and (c) are visualization of cross-attention maps on tokens &#;[V]&#; and &#;dog&#;.

Subject token cross-attention loss. To control the subject&#;s motion, we directly manipulate the cross-attention maps during inference. Since we introduce a unique token, &#;[V]&#;, in the training stage and associate it with the subject, we need to link this special token to the subject&#;s position within the cross-attention maps. As illustrated in Fig. 4, fine-tuning the model does not effectively connect the unique token to the cross-attention maps. Therefore, we propose a Subject Token Cross-Attention (STCA) loss to guide this process explicitly. First, we extract the cross-attention map, &#;&#;\mathbf{A}bold_A, at the tokens &#;[V] [class name]&#;. We then apply a Binary Cross-Entropy Loss to ensure that the corresponding attention map is larger at the subject&#;s position and smaller outside this region. This process incorporates the subject mask and can be expressed as:

&#;s&#;t&#;c&#;a=&#;[&#;&#;log&#;(&#;)+(1&#;&#;)&#;log&#;(1&#;&#;)].subscript&#;&#;&#;&#;&#;delimited-[]&#;&#;1&#;1&#;\mathcal{L}_{stca}=-\left[\mathbf{M}\log(\mathbf{A})+(1-\mathbf{M})\log(1-% \mathbf{A})\right].caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_c italic_a end_POSTSUBSCRIPT = - [ bold_M roman_log ( bold_A ) + ( 1 - bold_M ) roman_log ( 1 - bold_A ) ] .

(4)

During training, the overall loss function is defined as:

&#;=&#;s&#;u&#;b+λ1&#;&#;v&#;i&#;d+λ2&#;&#;s&#;t&#;c&#;a,&#;subscript&#;&#;&#;&#;subscript&#;1subscript&#;&#;&#;&#;subscript&#;2subscript&#;&#;&#;&#;&#;\mathcal{L}=\mathcal{L}_{sub}+\lambda_{1}\mathcal{L}_{vid}+\lambda_{2}\mathcal% {L}_{stca},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_v italic_i italic_d end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_c italic_a end_POSTSUBSCRIPT ,

(5)

where λ1subscript&#;1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript&#;2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are hyperparameters that control the weights of the different loss components.

The company is the world’s best Xiangtai Sculpture Crafts supplier. We are your one-stop shop for all needs. Our staff are highly-specialized and will help you find the product you need.