自从 ChatGPT 在 2021 年横空出世已经过了将近 3 年,AI 工具也已经爆发式增长了 3 年。是时候开个帖子盘点一下各领域好用的 AI 工具了。本文主要关注如下领域:计算机视觉、自然语言处理和语音处理。

综述

语音处理

  • insanely-fast-whisper:直接在 CLI 就能搞定视频的语音转文字
    • 这下可以完美丢开字节跳动的剪映了

附录

insanely-fast-whisper 转 SRT 字幕

从一个仓库抄来的,抄一点写一点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import os
import json
srt_filename = 'output.srt'
outputs = json.load(open('output.json'))

def seconds_to_srt_time_format(prev, seconds):
if not (isinstance(seconds, int) or isinstance(seconds, float)):
seconds = prev
else:
prev = seconds
hours = seconds // 3600
seconds %= 3600
minutes = seconds // 60
seconds %= 60
milliseconds = int((seconds - int(seconds)) * 1000)
hours = int(hours)
minutes = int(minutes)
seconds = int(seconds)
return (prev, f"{hours:02d}:{minutes:02d}:{int(seconds):02d},{milliseconds:03d}")

with open(srt_filename, 'w') as srt_file:
prev = 0
for index, chunk in enumerate(outputs['chunks']):
prev, start_time = seconds_to_srt_time_format(prev, chunk['timestamp'][0])
prev, end_time = seconds_to_srt_time_format(prev, chunk['timestamp'][1])
srt_file.write(f"{index + 1}\n")
srt_file.write(f"{start_time} --> {end_time}\n")
srt_file.write(f"{chunk['text'].strip()}\n\n")
os.remove('output.json')