如何在函数之间传递编辑的wav而不在两者之间保存wav?

2022-01-21 00:00:00 python converters audio wav base64

问题描述

我有 2 个人的 wav 对话(客户和技术支持)我有 3 个独立的函数，可以提取 1 个语音，剪切 10 秒并将其转换为嵌入.

I have a wav conversation of 2 people(customer and tech support) I have 3 separate functions that extract 1 voice, cut 10 seconds and transform it to embedding.

def get_customer_voice(file): print('getting customer voice only') wav = wf.read(file) ch = wav[1].shape[1]#customer voice always in 1st track sr = wav[0] c1 = wav[1][:,1] #print('c0 %i'%c0.size) if ch==1: exit() vad = VoiceActivityDetection() vad.process(c1) voice_samples = vad.get_voice_samples() #this is trouble - how to pass it without saving anywhere as wav? wf.write('%s_customer.wav'%file,sr,voice_samples)

下面的函数从上面的函数中截取 10 秒的 wav 文件.

function below cuts 10 seconds of wav file from function above.

import sys from pydub import AudioSegment def get_customer_voice_10_seconds(file): voice = AudioSegment.from_wav(file) new_voice = voice[0:10000] file = str(file) + '_10seconds.wav' new_voice.export(file, format='wav') if __name__ == '__main__': if len(sys.argv) < 2: print('give wav file to process!') else: print(sys.argv) get_customer_voice_10_seconds(sys.argv[1])

如何将它作为 wav 或其他格式传递而不将其保存到某个目录?它是在rest api中使用的，我不知道它会在哪里保存那个wav，所以最好应该以某种方式传递.

how to pass it as wav or other format without saving it to some directory? It's to be used in rest api, i don't know where it will save that wav, so preferably it should be passed somehow.

解决方案

我想通了——下面的函数不需要保存、缓冲等就可以工作.它接收一个 wav 文件并对其进行编辑，然后直接发送到 get math 嵌入函数:

I figured it out - the function below just works without saving, buffer etc. It receives a wav file and edits it and just sends straight to the get math embedding function:

def get_customer_voice_and_cutting_10_seconds_embedding(file): print('getting customer voice only') wav = read(file) ch = wav[1].shape[1] sr = wav[0] c1 = wav[1][:,1] vad = VoiceActivityDetection() vad.process(c1) voice_samples = vad.get_voice_samples() audio_segment = AudioSegment(voice_samples.tobytes(), frame_rate=sr,sample_width=voice_samples.dtype.itemsize, channels=1) audio_segment = audio_segment[0:10000] file = str(file) + '_10seconds.wav' return get_embedding(file)

关键是音频段中的tobytes()，它只是将它们再次组合到一个轨道中

the key is tobytes() in Audio segment, it just assembles all them together in 1 track again

相关文章