[Unity] Google STT (Speechs-To-Text) 적용 해보기

Unity 개발/기술 향상

[Unity] Google STT (Speechs-To-Text) 적용 해보기

내공부방 2024. 10. 25. 22:55

이 페이지에서는 Unity에서 Google STT(Speech-to-Text)를 적용하는 방법을 다룹니다.

Google의 STT API를 활용해 음성을 텍스트로 변환하는 과정을 간단한 코드와 설명으로 정리했습니다!

1. Google Cloud STT API 준비

Google Cloud Console에 접속해 프로젝트 생성
Speech-to-Text API 활성화
API Key 또는 OAuth 2.0 Access Token 발급

아래 페이지에서 TTS와 같은 방법으로 Google Cloud API에서 STT를 활성화합니다

https://developer-growth-history.tistory.com/45

[Unity] Google TTS (Text-To-Speechs) 적용 해보기

Unity에서 Google TTS를 적용을 해보는 페이지이다. 일단 TTS는 Text를 읽어 음성 파일로 변환해주는 것을 말한다. 적용하는 방법은 Token값을 이용하는 방법과 API Key를 이용하는 방법이 있는데 - API

developer-growth-history.tistory.com

2. Unity에서 Google STT 적용 코드

(1)Google URL

https://speech.googleapis.com/v1/speech:recognize

- 구글 SST url이다 해당 주소를 활용하여 값을 전달하면 된다

(2) Unity Code

1) DTO 클래스 정의

- Google STT 응답 데이터를 다루기 위해 DTO 클래스를 작성

[System.Serializable]
public class GoogleSpeechResponse
{
    public Result[] results;
}

[System.Serializable]
public class Result
{
    public Alternative[] alternatives;
}

[System.Serializable]
public class Alternative
{
    public string transcript;
}

2) 마이크 입력 및 음성 녹음

- 마이크로 10초 동안 16kHz로 녹음한 후 오디오 데이터를 변환

- 녹음된 데이터를 Base 64로 인코딩해 Google STT API로 전달

IEnumerator StartRecording()
{
    yield return new WaitForSeconds(10);

    // 마이크로 음성 녹음 (10초 동안, 16kHz 샘플링)
    AudioClip recording = Microphone.Start(null, false, 10, 16000);
    yield return new WaitForSeconds(10);
    
    // 녹음 종료
    Microphone.End(null);

    // 오디오 데이터 추출
    float[] samples = new float[recording.samples];
    recording.GetData(samples, 0);

    // 오디오 데이터를 byte 배열로 변환
    byte[] audioBytes = ConvertAudioClipToByteArray(samples);

    // 데이터가 비정상적일 경우 에러 처리
    if (audioBytes == null || audioBytes.Length == 0)
    {
        Debug.LogError("Audio data conversion failed.");
        yield break;
    }

    // Base64 인코딩
    string base64Audio = System.Convert.ToBase64String(audioBytes);

    // JSON 요청 데이터 생성
    var request = new
    {
        config = new
        {
            encoding = "LINEAR16",
            sampleRateHertz = 16000,
            languageCode = "ko-KR"  // 한국어로 설정
        },
        audio = new
        {
            content = base64Audio
        }
    };

    string requestJson = JsonConvert.SerializeObject(request);
    yield return StartCoroutine(SendRequest(requestJson));
}

3) 오디오 데이터를 byte 배열로 변환

- 오디오 샘플을 바이트 배열로 변환하는 코드입니다. 이후 Base64로 인코딩해 전송

byte[] ConvertAudioClipToByteArray(float[] samples)
{
    int sampleCount = samples.Length;
    int byteCount = sampleCount * 2;
    byte[] bytes = new byte[byteCount];

    for (int i = 0; i < sampleCount; i++)
    {
        short sample = (short)(samples[i] * short.MaxValue);
        bytes[i * 2] = (byte)(sample & 0xff);
        bytes[i * 2 + 1] = (byte)((sample >> 8) & 0xff);
    }

    return bytes;
}

4) Google STT API로 요청 전송

- UnityWebRequest를 사용해 Google STT API로 POST 요청을 보내고, 응답을 처리

IEnumerator SendRequest(string json)
{
    using (UnityWebRequest www = new UnityWebRequest(apiUrl + apiKey, "POST"))
    {
        byte[] bodyRaw = Encoding.UTF8.GetBytes(json);
        www.uploadHandler = new UploadHandlerRaw(bodyRaw);
        www.downloadHandler = new DownloadHandlerBuffer();
        www.SetRequestHeader("Content-Type", "application/json");

        yield return www.SendWebRequest();

        if (www.result == UnityWebRequest.Result.ConnectionError || 
            www.result == UnityWebRequest.Result.ProtocolError)
        {
            Debug.LogError("Error: " + www.error + "\nResponse: " + www.downloadHandler.text);
        }
        else
        {
            var response = JsonUtility.FromJson<GoogleSpeechResponse>(www.downloadHandler.text);
            foreach (var result in response.results)
            {
                foreach (var alternative in result.alternatives)
                {
                    Debug.Log("STT 결과: " + alternative.transcript);
                }
            }
        }
    }
}

3. 알아둬야 할점

Microphone.Start()는 WebGL 환경에서는 지원되지 않는다.

WebGL에서 STT를 구현할 때는 브라우저의 Web Audio API를 사용하거나, JavaScript와 Unity 간의 인터페이스를 활용해야 하기 때문에 다음 포스팅에서는 WebGL에서 STT 사용 예제를 작성해보자!

3가지 방법이 있는데 1번 방법을 통해 예제 진행!

JavaScript로 마이크 입력 처리 후 Unity에 전달 (Jslib)
WebRTC나 Web Audio API로 녹음 구현
녹음된 파일을 서버로 업로드해 STT 처리

저작자표시 비영리 변경금지

'Unity 개발 > 기술 향상' 카테고리의 다른 글

[Unity] 깔끔한 코드 작성을 위한 스타일 가이드 (2) (4)	2024.10.27
[Unity] 깔끔한 코드 작성을 위한 스타일 가이드 (1) (1)	2024.10.26
[Unity] 모바일 앱과 웹 간의 소켓 통신 방법 및 예제 (0)	2024.10.23
[Unity] DLL 만들기 (2) (1)	2024.10.21
[Unity] DLL 만들기 (1) (1)	2024.10.21

현재글[Unity] Google STT (Speechs-To-Text) 적용 해보기

공부 History

Unreal, MVVM, service locator, mvc, 오블완, 티스토리챌린지, unity 한글, jenkins, unity6, unity, 다이어그램, DLL, appsflyer 세팅, MVVM 패턴, script updating consent, 아키텍처 패턴, some of this project's source files refer to api that has changed, unity dll, unity6 특징, MVC 패턴,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

공부 History