³×À̹ö(´ëÇ¥ ÇѼº¼÷)´Â Áö³ 8¿ù 30ÀϺÎÅÍ 9¿ù 3ÀϱîÁö ÁøÇàµÈ À½¼º ¹× ½Åȣó¸® ÇÐȸ ‘ÀÎÅͽºÇÇÄ¡(INTERSPEECH) 2021’¿¡¼ 9°³ÀÇ ³í¹®À» ¹ßÇ¥Çß´Ù°í 6ÀÏ ÀüÇß´Ù.
ƯÈ÷, ³×À̹ö ÀϺ» °ü°è»ç ‘¶óÀÎ’ÀÌ ¹ßÇ¥ÇÑ ³í¹®À» ÇÕÇϸé ÃÑ 14°³¿¡ ´ÞÇÑ´Ù.
ÀÌ¿Í °°Àº ¼º°ú¸¦ ±â¹ÝÀ¸·Î, ³×À̹ö´Â ±Û·Î¹ú AI ¸®´õ½Ê È®º¸¿¡ Àû±ØÀûÀ¸·Î ³ª¼³ °èȹÀÌ´Ù.
¿ÃÇØ 22ȸ¸¦ ¸ÂÀº ‘ÀÎÅͽºÇÇÄ¡’´Â ±¸±Û, ÆäÀ̽ººÏ, ¾Æ¸¶Á¸, ¾Ë¸®¹Ù¹Ù µî ±Û·Î¹ú ±â¾÷ÀÇ À½¼ºÀÎ½Ä °ü·Ã ÃֽŠ±â¼úÀÌ °øÀ¯µÇ´Â ÀÚ¸®·Î, ICASSP¿Í ÇÔ²² À½¼º ¹× ½Åȣó¸® ÇÐȸ·Î ÀÎÁ¤¹Þ°í ÀÖ´Ù.
³×À̹ö¿Í ¶óÀÎÀº ¿ÃÇØ 6¿ù °³ÃÖµÈ ‘ICASSP 2021’¿¡¼µµ 14°ÇÀÇ ³í¹®À» ¹ßÇ¥Çß´Ù.
'ÀÎÅͽºÇÇÄ¡ 2021' äÅÃÇÑ ³×À̹ö ³í¹® ¸®½ºÆ® |
1. High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model
- Ȳ¹ÎÁ¦, Ryuichi Yamamoto (LINE), ¼ÛÀº¿ì, ±èÀç¹Î
- À½¼º ¸ðµ¨¸µ ±â¹ýÁß ÇϳªÀÎ multi-band harmonic-plus-noise ¸ðµ¨À» Parallel WaveGAN ¿¡ Àû¿ëÇØ º¸ÄÚ´õÀÇ Ç°Áú°ú ¾ÈÁ¤¼ºÀ» Çâ»ó.
2. LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesizer Based on Generative Adversarial Networks
- Kim Nguyen (¿¬¼¼´ë), Á¤±âÇõ (¿¬¼¼´ë), ¾ö¼¼¿¬ (¿¬¼¼´ë), Ȳ¹ÎÁ¦, ¼ÛÀº¿ì, °È«±¸ (¿¬¼¼´ë)
- Text-to-feature ¿Í feature-to-wave ·Î ºÐ¸®µÇ¾îÀÖ´Â TTS ÆÄÀÌÇÁ¶óÀÎÀ» Çϳª·Î ¹´Â text-to-wave ¸ðµ¨À» Á¦¾È.
3. Label Embedding for Chinese Grapheme-to-Phoneme Conversion
- ÃÖÀººñ (KAIST), ±èÈ¿¬, ±èÁ¾È¯, ±èÀç¹Î
- Chinese polyphone conversion problemÀ» À§ÇÑ label embedding approach¸¦ Á¦¾È.
4. Look Who’s Talking: Active Speaker Detection in the Wild
- ±èÀ¯Áø, ÇãÈñ¼ö, ÃÖ¼Ò¿¬, Á¤¼öȯ, ±ÇÀ¯È¯, À̺ÀÁø, ±Ç¿µ±â, Á¤Áؼ±
- ¸ÖƼ¸ð´Þ Active Speaker Detection µ¥ÀÌÅͼ Á¦ÀÛ °ø°³
5. Adapting Speaker Embeddings for Speaker Diarisation
- ±Ç¿µ±â, Á¤Áö¿ø, ÇãÈñ¼ö, ±èÀ¯Áø, À̺ÀÁø, Á¤Áؼ±
- Speaker Diarization ¼º´É °³¼±À» À§ÇÑ Speaker Embedding °È ¹æ¹ý ¿¬±¸
6. Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
- Á¤Áö¿ø, ÇãÈñ¼ö, ±Ç¿µ±â, Á¤Áؼ±, À̺ÀÁø
- 3°³ Ŭ·¡½º+CRNNÀ» ÀÌ¿ëÇÑ Overlapped Speech Detection ±â¹ý Á¦¾È
7. Graph Attention Networks for Anti-Spoofing
- Hemlata Tak (EURECOM), Á¤Áö¿ø, Jose Patino (EURECOM), Massimiliano Todisco (EURECOM), Nicholas Evans (EURECOM)
- GNNÀ» ÀÌ¿ëÇÑ audio spoofing detection ¿¬±¸
8. DEMUCS-Mobile : On-device lightweight speech enhancement
- ÀÌ·çÄ«½º, ÁöÀ¯³ª, À̹ÎÀç, Ãֹμ®
- CNN µö·¯´× ±â¹Ý ÀâÀ½ Á¦°Å ¸ðµ¨ÀÇ °æ·®È ±â¹ýÀ» Á¦¾ÈÇÏ¿© ¸ð¹ÙÀÏ on-device¿¡¼ ½Ç½Ã°£À¸·Î µ¿ÀÛ°¡´ÉÇÑ ÀâÀ½Á¦°Å ¸ðµ¨ ±¸Çö
9. Layer Pruning on Demand with Intermediate CTC
- ÀÌÀç¼Û, °Áø±¸, Shinji Watanabe (CMU)
- Transformer+CTC ¸ðµ¨À» fine-tuning ¾øÀÌ pruningÇϱâ À§ÇÑ ÇнÀ ¹æ¹ý·Ð |
‘ÀÎÅͽºÇÇÄ¡ 2021’¿¡¼ äÅÃµÈ ³×À̹öÀÇ ¿¬±¸ ³í¹®Àº À½¼ºÀνÄ, À½¼ºÇÕ¼º, µ¥ÀÌÅͼ Á¦ÀÛ µî À½¼º ¹× ½ÅÈ£¿Í °ü·ÃµÈ ´Ù¾çÇÑ ºÐ¾ß¸¦ Æ÷°ýÇÑ´Ù.
ÀϺΠ³í¹®Àº ³×À̹öÀÇ ½ÇÁ¦ ¼ºñ½º¿¡ Àû¿ëµÇ¸ç »ç¿ëÀÚ¿¡°Ô ´õ¿í Æí¸®ÇÑ ¼ºñ½º °æÇèÀ» Á¦°øÇÏ°í ÀÖ´Ù.
À½¼º ÇÕ¼º Ç°Áú Çâ»ó ±â¼ú ¿¬±¸´Â ‘Ŭ·Î¹Ù´õºù’, ‘Ŭ·Î¹Ù ½º¸¶Æ® ½ºÇÇÄ¿’, ‘³×À̹ö AiCALL’ µî ³×À̹ö Ŭ·Î¹Ù¿¡¼ Á¦°øÇÏ´Â ´Ù¾çÇÑ À½¼º ÇÕ¼º ¼ºñ½º¿¡ Àû¿ëµÆ´Ù.
¿©·¯ ÈÀÚ°¡ µ¿½Ã¿¡ ¸»ÇÏ´Â »óȲ¿¡¼ ‘ÈÀÚ ºÐÇÒ(Speaker Diarisation)’À» À§ÇØ °¢ ¹ßÈÀÚÀÇ Æ¯Â¡À» ´õ¿í Á¤È®ÇÏ°Ô ÇнÀÇÏ´Â ±â¹ý¿¡ ´ëÇÑ ¿¬±¸´Â ‘Ŭ·Î¹Ù³ëÆ®’ ¼ºñ½º °³¼± µî¿¡ È°¿ëµÇ°í ÀÖ´Ù.
¶ÇÇÑ ³×À̹ö°¡ ¹ßÇ¥ÇÑ 9°ÇÀÇ ³í¹® Áß 5°ÇÀº ÄÄÇ»ÅÍ °úÇÐ ¹× Á¤º¸ ½Ã½ºÅÛ ºÐ¾ß¿¡¼ ¼±µÎÁÖÀÚ·Î ²ÅÈ÷´Â ÇÁ¶û½º ¿¬±¸±â°ü À¯·¹ÄÄ(EURECOM)À» ºñ·ÔÇØ, Ä«³×±â¸á·Ð´ëÇб³, Ä«À̽ºÆ®(KAIST), ¿¬¼¼´ëÇб³, ¶óÀÎ µî AI ±â¼úÀ» À̲ô´Â ±¹³»¿Ü ±â¾÷ ¹× ¿¬±¸ ±â°üµé°ú Çù·Â ¿¬±¸·Î ¼öÇàµÆ´Ù.
³×À̹ö´Â ±Û·Î¹ú AI ¿¬±¸Çõ½Å »ýÅÂ°è ±¸ÃàÀ» À§ÇÑ ³ë·ÂÀÌ °¡½ÃÀûÀÎ ¼º°ú·Î µå·¯³ª°í ÀÖ´Ù°í µ¡ºÙ¿´´Ù.
À̹ø ¼º°ú¸¦ ±â¹ÝÀ¸·Î ³×À̹ö´Â ±Û·Î¹ú AI R&D »ýÅ°踦 Áö¼ÓÀûÀ¸·Î È®ÀåÇØ°¡°Ú´Ù´Â °èȹÀÌ´Ù.
ÇöÀç ¶óÀΰúÀÇ ¿¬°è¸¦ Áß½ÉÀ¸·Î ÇÑ ÀϺ»À» ³Ñ¾î, º£Æ®³²¿¡¼´Â Ç㽺Ʈ(HUST), PTIT¿Í °øµ¿¿¬±¸¼¾Å͸¦ ¼³¸³ÇßÀ¸¸ç, À¯·´¿¡¼´Â ÇÁ¶û½ºÀÇ ³×À̹ö·¦½ºÀ¯·´°ú ¿¬±¸Çù·ÂÀ» À̾°í ÀÖ´Ù.
Áö³ 7¿ù¿¡´Â Àΰø½Å°æ¸Á°ú ·Îº¿°øÇÐ ºÐ¾ß µ¶ÀÏÀÇ Æ¢ºù°Õ´ëÇб³¿Í °øµ¿¿¬±¸¼¾ÅÍ ¼³¸³¿¡ ÇÕÀÇÇϱ⵵ Çß´Ù.
Á¤¼®±Ù ³×À̹ö Ŭ·Î¹Ù CIC ´ëÇ¥´Â “R&D¿¡ ´ëÇÑ Àû±ØÀûÀÎ ÅõÀÚ¸¦ ¹ÙÅÁÀ¸·Î ³×À̹öÀÇ AI ¿¬±¸Çõ½Å »ýÅ°è´Â ±× ±Ô¸ð¸¦ ´õ¿í Å°¿ö°¡°í ÀÖ´Ù”¸ç “ÀÎÅͽºÇÇÄ¡ 2021¿¡¼ ³×À̹ö¿Í ¶óÀÎ, ±×¸®°í ±Û·Î¹ú AI ¿¬±¸ÁøÀÌ ¶Ù¾î³ ¼º°ú¸¦ °ÅµÐ °Íó·³, ¾ÕÀ¸·Î ³×À̹ö¸¦ Áß½ÉÀ¸·Î ±¸ÃàµÉ ´õ¿í ´Ù¾çÇÑ AI R&D »ýÅ°迡¼ »õ·Î¿î ¼º°ú°¡ ³ª¿Ã ¼ö ÀÖÀ» °ÍÀ¸·Î ±â´ëÇÑ´Ù”°í ¸»Á™´Ù.
<±èµ¿±â ±âÀÚ>kdk@bikorea.net < ÀúÀÛ±ÇÀÚ © BI KOREA ¹«´ÜÀüÀç ¹× Àç¹èÆ÷±ÝÁö > |