School of Science and Technology 科技學院
Computing Programmes 電腦學系

Vertical Language Model Based on SingSharp’s Knowledge Base

LEUNG Chun Hung Jonas, Kuai Le, YI Yicheng 

ProgrammeBachelor of Computing with Honours in Internet Technology

Bachelor of Science with Honours in Computer Science
SupervisorDr. Roy Li
AreasIntelligent Applications
Year of Completion2024

Objectives

Project Aim

The aim of this project is to implement a RAG framework to enhance the effectiveness of the LLaMA-2 model within the singing domain. By employing Semantic Textual Similarity (STS) techniques, the model will retrieve the most relevant context from the vocal arts specialized knowledge base on singing. 

Project Objectives

The goal is to develop a specialized Retrieval-Augmented Generation (RAG) system using LLaMA-2 for singing-related queries. Key objectives include: 

  1. Build a diverse singing Q&A database covering different genres, skill levels, and techniques. 
  2. Enable effective retrieval using Semantic Textual Similarity (STS) techniques to extract contextually relevant content from the knowledge base. 
  3. Fine-tune the LLaMA-2 model to generate accurate and context-aware responses tailored to vocal training and singing technique questions. 
  4. Conduct robust evaluation and testing to measure the system's accuracy, responsiveness, and adaptability in the singing domain. 

Altogether, the project focuses on making singing-related AI support both intelligent and highly user-specific.

Videos

Demonstration Video

Presentation Video

Methodologies and Technologies used

Overview of Solution

The project utilizes a combination of LLaMA-2, Retrieval-Augmented Generation (RAG), and Semantic Textual Similarity (STS) to build a system that delivers accurate, context-aware responses focused on singing techniques.

Key Goals: 
  • Enhance the precision and relevance of AI-generated answers for vocal training. 
  • Support a wide range of users—from beginners to professional singers. 
  • Ensure scalability across sub-domains like classical singing, modern styles, and vocal health. 

The chapter introduces the architecture and technical methods supporting this solution, designed to handle singing-related queries more effectively through advanced natural language understanding. 

Data Collection and Augmentation 

The project's data collection phase focuses on creating a robust, domain-specific knowledge base for singing. The content is structured into 58 detailed paragraphs, grouped under five thematic units: 

  1. Vocal Anatomy and Physiology 
  2. Vocal Production and Technique
  3. Sound Characteristics and Adjustment
  4. Vocal Styles and Types 
  5. Resonance and Adjustments 

To enhance this resource, ChatGPT 3.5 Turbo was used to generate 10 Q&A pairs per paragraph, resulting in 580 total pairs. This strategy ensures the system can respond accurately and contextually to a broad range of singing-related queries, from basic concepts to advanced vocal techniques. 

 

To further strengthen the system's performance, the databricks-dolly-15k dataset—a rich collection of human-like conversational text—was used to fine-tune the LLaMA model. This addition significantly improved the model's ability to: 

  • Understand natural language nuances 
  • Generate more fluent, contextually relevant, and human-like responses 
  • Perform better on domain-specific queries related to singing 

This dataset integration also enhances the model's compatibility with the existing framework, making it more practical for real-world interaction. 

Quality Control and Ongoing Enhancement 

Every Q&A pair is manually reviewed and refined for clarity and accuracy. There are also plans for continuous expansion of the knowledge base, including new AI-generated content, ensuring the system evolves alongside users' learning needs. 

System Design 

Prototype System Design
  1. User Query Input: A user submits a question to the system.
  2. Semantic Textual Similarity: The question and the knowledge base undergo

semantic textual similarity analysis. This process assesses how closely the user’s 

query aligns with the content in the knowledge base. 

  1. Knowledge Selection: Based on the results of semantic textual similarity, the system

reduces the knowledge base, focusing on the most pertinent information. 

  1. RAG Technology: The question, accompanied by the streamlined knowledge base, is

processed using RAG technology. RAG combines retrieval and generation methods to 

retrieve relevant information from the knowledge base and generate a response. 

  1. Pretrained Language Model: The RAG-generated response undergoes further

refinement using a pre-trained language model LLaMA-2. This model possesses 

general language understanding and generation capabilities, contributing to the quality 

and coherence of the final answer. 

  1. Answer Generation: The final answer is generated and presented to the user.

Figure 1: Design of the System

The system prioritizes backend performance, emphasizing computational efficiency and response precision through a modular and scalable architecture. 

Key Architectural Elements: 

  • Built in Python, using the Hugging Face Transformers library. 
  • Integrates BERT for query classification and semantic similarity analysis to accurately interpret user inputs. 
  • Uses LLaMA-2 for generating high-quality, context-aware text responses. 
  • PyTorch is the core backend framework, enabling efficient model training and execution. 

Development Approach: 

  • Started with architectural design and environment setup. 
  • Focused on seamless integration between the classification model and the knowledge base. 
  • Applied iterative testing and optimization to ensure robust performance across diverse user queries and data loads. 

Example of Generation – What is belt? 

A2 (LLaMA-2) What are the different types of belts? 

Types of Belt Materials 

1. Leather Belts: Leather belts come in various styles, colors and designs made from genuine leather or synthetic materials such as pebbled or embossed leather. They offer durability, style versatility for both casual and formal occasions. 

2. Fabric Belts: Fabric belts have become increasingly popular due to their lightweight comfort factor being an alternative option compared with heavier leather counterparts while still delivering quality performance at affordable prices; they’re ideal suede jacquard wool blends weaves cotton twill silk hemp chiffon organza laminated nylon canvas spandex elastic lace ribbons more! These fabrications provide unique textures visual effects allowing endless styling possibilities fashion forward dressy elegant business professional athleisure looks alike no matter what you prefer! 

3. Buckled Belts: This type consists of two metal pieces connected by a spring-loaded buckle that fastens around your waist; offering stability support comfort throughout long periods without compromising mobility providing structure when needed yet easily removable without undoing all work done during day activities involving physical labor outdoor pursuits sports & fitness training travel adventures etcetera making them perfect addition wardrobe essentials everyone who enjoys stay active outside work office environments (especially those living areas prone extreme weather conditions). The most common materials used for these kinds include stainless steel brass silver titanium copper zinc aluminum alloy carbon fiber glass fibers wood bamboo plastics recycled bike tubes old car parts recovered from scrap yards salvaged tools DIY craft supply shops junkyards scrapyards garage sales secondhand stores thrift shops vintage markets flea markets yard sales estate auctions garage liquidation tag sales rummage sales consignment boutiques resale shops discount retail stores wholesalers distributors factories direct manufacturers importers exporters trading companies commodity brokers middlemen wholesaler-retailer networks dropshippers online marketplaces eBay Amazon Alibaba etc.). 

 

A2 (ChatGPT) A belt is a flexible band or strap, typically made of leather or a synthetic material, that is worn around the waist. The main purpose of a belt is to secure or hold up clothing, primarily trousers or other articles of clothing that might be loose-fitting. Belts are also used as fashion accessories, where they can add a stylistic touch to an outfit. They usually feature a buckle at one end, which helps in adjusting the length to fit the wearer’s waist and can be designed in various styles, widths, and colors. Additionally, belts can serve other functions, such as carrying tools or weapons in certain professions or uniforms. 

 

A2 (RAG) Belt refers to the technique used in singing to create a loud, powerful sound by tilting the cricoid cartilage. 

 

A2 (Finetune + RAG) A belt is a strip of leather, plastic material (sometimes cotton cloth) with metal buckles at both ends for fastening around the waist. Belt comes from middle English word betel; this came ultimately from French from Germanic base *butal- 'belt', whence Gothic butliz, Old Norse beital etc., cf. Old Church Slavonic bytoлi ‘cloth’ + 35 Lithuanian baltos ‘a piece of clothing’, via Latin butillum, ultimately derived from Proto-Indo European *bul- (see buy). Its first known use was in Middle English period beginning after mid 12th century (“A Dictionary of First Names”). The term originally referred only to military sashes, which passed through the hole (ring) worn round one leg above the knee rather than going over it.) Originally made purely of raw materials, most belts are today finished products made using machine methods or manually tailored fabric. The most common types seen everyday life include fashion belt buckle – often decorated and typically manufactured exclusively from plastics like polypropylene or polyester; and sports belt buckle made entirely out of stainless steel for running shoes and other athletic uses due mainly because they provide durable protection against corrosion during physical activity; high heels for women; jeans low rise styles; suspenders worn together when combined as part pantsuit ensemble, including pleats instead of buttons down front side seamline ( 

 

What is the issue with a ‘break’ for performers? 

A3 (LLaMA-2) It was recently announced that there will be an indefinite “break” for K-pop group BTS, which has sparked discussion among fans and industry experts about the potential consequences of taking time off from performing. Here are some possible reasons why breaks can be problematic for performers: 1. Burnout and exhaustion: Continuous performances without adequate rest and recovery can lead to physical and mental burnout, causing fatigue, stress, and a decrease in overall wellbeing. Taking a break allows performers to recharge their energy levels and prevent burnout. 2. Creative stagnation: Without sufficient downtime, creativity and inspiration may suffer, leading to repetitive and unoriginal work. A break provides an opportunity for artists to explore new ideas, experiment with different styles, or take time to reflect on their artistry. 3. Lack of momentum: An extended absence from live performances or releases can result in lost steam and traction, making it challenging to regain audience interest once activity resumes. Artists might miss out on opportunities to engage with fans, promote new material, or build upon recent successes during this period. 4. Financial impacts: Taking a hiatus could potentially negatively affect an artist’s income stream, as they would cease generating earnings from tour dates, merchandise sales, or other activities related to their craft. This might make it more difficult to invest in future projects or sustain themselves financially while away. 5. Perception and expectations: Fans may perceive an extended absence from the scene as indicative of declining popularity or a lack of commitment to their craft. Negative perceptions can lead to decreased support and enthusiasm for the artist upon their return. In conclusion, taking a “break” can have both positive and negative effects depending on various factors. It’s crucial for performers to carefully consider their reasons for stepping back, manage their expectations, and strategize their comeback accordingly to maintain long-term stability and growth within the entertainment industry. 

 

A3 (ChatGPT) A belt is a flexible band or strap, typically made of leather or a synthetic material, that is worn around the waist. The main purpose of a belt is to secure or hold up clothing, primarily trousers or other articles of clothing that might be loose-fitting. Belts are also used as fashion accessories, where they can add a stylistic touch to an outfit. They usually feature a buckle at one end, which helps in adjusting the length to fit the wearer’s waist and can be designed in various styles, widths, and colors. Additionally, belts can serve other functions, such as carrying tools or weapons in certain professions or uniforms. 

 

A3 (RAG) For singers and actors, a break can lead to a loss of control over their vocal register, resulting in changes in tone or quality that are difficult to correct during performance. Additionally, breaks may cause weakening of the voice throughout specific parts of a pitch range, leading to less consistent delivery. In some cases, breaks may even result in vocal strain or damage if proper care is not taken to address them before they become more pronounced issues. By recognizing when a “break” occurs and taking steps to manage it effectively, performers can maintain better control over their voices and prevent long-term damages from occurring. 

 

A3 (Finetune + RAG) The issue with a break from the perspective of performer is that it 37 can lead to uncontrolled changes in their vocals while performing. Performers may not realize they are transitioning into another register during performance because there could also be other issues impacting the singer such as fatigue, acid reflux etc., which might compound problems for the singer leading them unaware that they have entered another register. In the best case scenario, if the singer realizes when she enters another vocal register (for example mid-sentence), he/she would quickly adapt, but most likely will lose her flow completely if left unattended until completion; ultimately dropping valuable momentum built up before this moment due to time spent trying out new notes & vowels without proper training or practice available at any given point within song structure! It’s important therefore for all singers –from beginner amateur hobbyists right down professionals–to understand how many parts make up our voices so we know what kind of work goes into putting together great soundscapes when producing music videos or live performances too! By mastering these techniques you'll find yourself feeling more comfortable making your own decisions about vocal production like never before – giving way towards greater confidence overall when it comes creating music either alone or alongside others! So don’t give up just yet though – keep working hard no matter where u start out🎉cause YOU got THIS!💪. 

Conclusion

The project successfully implemented a Retrieval-Augmented Generation (RAG) system powered by LLaMA-2 and enhanced with Semantic Textual Similarity (STS) to provide accurate, context-aware answers within the singing domain. Key achievements included: 

  • Building a specialized knowledge base. 
  • Fine-tuning LLaMA-2 for domain relevance. 
  • Enabling efficient content retrieval and generating high-quality responses. 
  • Establishing and meeting robust evaluation benchmarks. 

Limitations 

  • The knowledge base, while rich, lacked full coverage for rare or complex singing topics. 
  • The system was unable to handle multi-turn dialogue, limiting conversational continuity. 
  • Use of top-1 context retrieval restricted the model’s response depth. 

Future Development

  • Shift to a top-k retrieval approach to synthesize insights from multiple sources. 
  • Expand the knowledge base to include wider and deeper content across all singing subfields. 
  • Improve context management for smoother, ongoing conversations. 
Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5