I cloned AI Trump’s voice to a Cantonese song by Terence Lam, the result is worth discussing (a cross language experiment with so-vits-svc)

Austin Yip
4 min readJul 31, 2023

--

If you are not interested in any of the discussions but only the results, please scroll down directly to the single YouTube video in this article.

A few weeks ago, I was experimenting with cloning myself to assess whether AI Austin Yip could sing better than the real Austin. The outcome was quite promising, and it led me to believe that with further refinement, I might create a mega-Austin capable of exceptional singing.

The question

However, during my work on replicating my own voice, a question arose in my mind. What if I use a model entirely based on a different language and apply it to Cantonese? Would the cloned voice sound authentic, or would it simply resemble a non-Cantonese speaker attempting to speak Cantonese?

The simplest way to find out is through experimentation. Considering the abundance of online models, I opted for an iconic one that also boasts a well-sampled library — Trump’s voice.

I successfully downloaded Trump’s model from Huggingface, which offers two distinct versions: one with 18.5k epochs and another with 68k epochs. I am sincerely grateful to the individual who trained this model, as creating one from scratch would have consumed days of my time, not to mention the additional effort required to clean up the recordings.

Methodology

A mockup CD cover for the experiment

My next step is to make AI Trump sing Terence Lam’s song Remember (林家謙《記得》). The procedure differs slightly from my last experiment. The following is a step-by-step walkthrough of how I attained the final results.

  1. I downloaded the music version of Terence Lam (林家謙)’s song Remember (記得) on Youtube, and imported to my DAW.
  2. I sang the song in Cantonese.
  3. I exported my singing in .wav format, and imported that into so-vits-svc (see how I did it last time)
  4. I used Trump’s model and cloned it for my singing. In this case, I employed Dio instead of Crepe, as I have come across numerous accounts stating that Crepe yields superior results. However, in my experience, Crepe consistently produces a cracking sound.
  5. I placed the inferred version back to DAW and created the recording.

Results

The results have been uploaded to YouTube. If there are any instances of copyright violation, please inform me, and I will promptly remove the content.

Discussion

To begin with, I believe that any native Cantonese speakers would agree that this DOES NOT sound native at all. Trying to understand the reasons behind it is probably what I want to learn more from this experiment. Below, I attempt to list out my observations. Discussions are welcomed.

  1. Vocal placement
    Comparing how I would sing (or speak) the song, my vocal placement is a lot more forward than AI Trump’s.
  2. Tonal inflection (or tone contours)
    This is perhaps one of the most challenging aspects to master in Cantonese. While listening to AI Trump’s version, I noticed that when he sings tone 1 (dark flat 陰平) or tone 4 (light flat 陽平), he sounds significantly better than with other tones. For instance, around 00:38, when AI Trump sings “離開的臉,” the 開(hoi1) and 的(dik1) are pronounced very well. In contrast, the 臉(lim5) is comparatively weaker.
  3. Consonances
    One of the most recognizable non-Cantonese speaking elements in AI Trump’s singing is the consonances. If we listen to 00:40–00:44, where he sings “飛走了難再遇見”, the 再(zoi3) is sung with a buzzing “zh” sound.
  4. Vowels
    The other thing I realized from the recording is how the vowels are altered. For instance, at 00:48 when he sings “八月晚風輕輕吹過多恬靜,” the 吹(ceoi1) is sung with a more closed vowel, whereas we would typically sing it with a more open one. Similarly, at 3:03 when he sings “請記得誰和誰掠過,” the 掠(loek6) is sung with a more open vowel before it closes up at the end.

Further Discussion

While the above examples are only a few instances from the experiment, there are likely many more aspects worthy of discussion among composers, singers, conductors, and linguists. In the discussion section, I have solely presented my observations. Currently, I have no insight into why these occurrences transpired or how I could transform them into something more beneficial for my artistic pursuits.

The results, however, have proven to be quite helpful in understanding the differences between two languages with which I am very familiar. I am highly likely to continue experimenting with them to further explore and learn from their nuances.

I hope this experiment proves useful to individuals interested in languages, and that it sparks further discussions on voice-cloning and cross-cultural studies.

p.s. If you are interested in the original singing by Terence Lam, please click here.

If you are interested in my study on so-vits-svc, pelase read the first two parts of the experiment here,

  1. I cloned my own voice to create an AI-Austin Yip using my Mac Studio
  2. I was not happy with my last AI-cloned Austin Yip, and I fixed the problem

You may also check out my other works at www.austinyip.com, or follow me at www.instagram.com/austinyip_thecomposer

Happy music making.

--

--

Austin Yip
Austin Yip

Written by Austin Yip

Composer | Interdisciplinary Artist - Talk about music, art, and life. https://www.austinyip.com

No responses yet