č®­ē»ƒå®žę—¶čÆ­éŸ³čÆ†åˆ«ParaformeręØ”åž‹å®Œę•“ęŒ‡å—ļ¼šä»Žę•°ę®å‡†å¤‡åˆ°ęØ”åž‹čÆ„ä¼°

šŸŽ¤ åœØäŗŗå·„ę™ŗčƒ½åæ«é€Ÿå‘å±•ēš„ä»Šå¤©ļ¼ŒčÆ­éŸ³čÆ†åˆ«ęŠ€ęœÆå·²ē»ęˆäøŗä¼—å¤šåŗ”ē”Øēš„ę øåæƒć€‚ē„¶č€Œļ¼Œé€šē”ØčÆ­éŸ³čÆ†åˆ«ęØ”åž‹åœØē‰¹å®šé¢†åŸŸå¾€å¾€č”ØēŽ°äøä½³ć€‚ęœ¬ę–‡å°†čÆ¦ē»†ä»‹ē»å¦‚ä½•åŸŗäŗŽFunASRę”†ęž¶č®­ē»ƒå®žę—¶čÆ­éŸ³čÆ†åˆ«ParaformeręØ”åž‹ļ¼Œä»Žę•°ę®å‡†å¤‡ć€ęØ”åž‹č®­ē»ƒåˆ°ę€§čƒ½čÆ„ä¼°ēš„å®Œę•“ęµēØ‹ļ¼Œåø®åŠ©ä½ ęž„å»ŗé€‚ē”ØäŗŽē‰¹å®šé¢†åŸŸēš„é«˜ē²¾åŗ¦čÆ­éŸ³čÆ†åˆ«ē³»ē»Ÿć€‚

šŸ”¬ ē ”ē©¶čƒŒę™ÆäøŽåŠØęœŗ

é€šē”ØęØ”åž‹ēš„å±€é™ę€§

åø‚é¢äøŠēš„čÆ­éŸ³čÆ†åˆ«ęØ”åž‹åŸŗęœ¬äøŠéƒ½ę˜Æé€šē”ØčÆ†åˆ«ęØ”åž‹ļ¼ŒåœØē‰¹å®šé¢†åŸŸēš„åŗ”ē”Øäø­å­˜åœØä»„äø‹é—®é¢˜ļ¼š

  • å­—é”™ēŽ‡čæ‡é«˜ļ¼šåÆ¹äŗŽäø“äøšęœÆčÆ­å’Œč”ŒäøščÆę±‡čÆ†åˆ«å‡†ē”®ēŽ‡ä½Ž
  • čÆ»éŸ³å·®å¼‚ļ¼šäøåŒé¢†åŸŸēš„ę•°å­—čÆ»éŸ³č§„čŒƒäøåŒ
  • äøŠäø‹ę–‡ē†č§£ļ¼šē¼ŗä¹ē‰¹å®šåœŗę™Æēš„čÆ­čØ€ęØ”åž‹ę”ÆęŒ

é¢†åŸŸē‰¹åŒ–ēš„åæ…č¦ę€§

ä»„čˆŖē©ŗé¢†åŸŸäøŗä¾‹ļ¼Œå­˜åœØčÆøå¤šęŒ‘ęˆ˜ļ¼š

graph TD
    A[é€šē”ØčÆ­éŸ³čÆ†åˆ«ęØ”åž‹] --> B[čˆŖē©ŗé¢†åŸŸåŗ”ē”Ø]
    B --> C[čÆ†åˆ«å‡†ē”®ēŽ‡ä½Ž]
    B --> D[äø“äøšęœÆčÆ­ę— ę³•čÆ†åˆ«]
    B --> E[ę•°å­—čÆ»éŸ³äøåŒ¹é…]
    
    C --> F[éœ€č¦é¢†åŸŸē‰¹åŒ–č®­ē»ƒ]
    D --> F
    E --> F
    
    F --> G[ęå‡čÆ†åˆ«ē²¾åŗ¦]
    F --> H[é€‚åŗ”äø“äøšåœŗę™Æ]
    F --> I[é™ä½Žå­—é”™ēŽ‡]

å…·ä½“é—®é¢˜åŒ…ę‹¬ļ¼š

  • čˆŖē©ŗäø“ęœ‰åčÆļ¼ˆå¦‚ā€ē›²é™čæ›čæ‘ā€ć€ā€å»ŗē«‹čˆŖé“ā€ē­‰ļ¼‰
  • ę•°å­—čÆ»éŸ³č§„čŒƒäøŽę—„åøøäøåŒ
  • ē‰¹å®šé€šäæ”åč®®å’ŒęœÆčÆ­

č§£å†³ę–¹ę”ˆļ¼šé€ščæ‡åÆ¹å¼€ęŗParaformeręØ”åž‹čæ›č”Œé¢†åŸŸē‰¹åŒ–č®­ē»ƒļ¼ŒåÆä»„ę˜¾č‘—ęå‡åœØē‰¹å®šåœŗę™Æäø‹ēš„čÆ†åˆ«å‡†ē”®ēŽ‡ć€‚

šŸ“Š ę•°ę®é›†å‡†å¤‡

ę•°ę®é›†ę¦‚č§ˆ

ęœ¬ę¬”č®­ē»ƒä½æē”Øēš„ę•°ę®é›†å…·ęœ‰ä»„äø‹ē‰¹ē‚¹ļ¼š

  • ę•°ę®č§„ęØ”ļ¼š34,090ę”éŸ³é¢‘ę•°ę®
  • éŸ³é¢‘ę ¼å¼ļ¼š8kHzé‡‡ę ·ēŽ‡WAVꖇ件
  • ę•°ę®åˆ†å‰²ļ¼š90%č®­ē»ƒé›†ļ¼ˆ30,681ę”ļ¼‰+ 10%éŖŒčÆé›†ļ¼ˆ3,409ę”ļ¼‰
  • åŗ”ē”Øé¢†åŸŸļ¼ščˆŖē©ŗé€šäæ”äø“äøšęœÆčÆ­

ę•°ę®é›†č“Øé‡č¦ę±‚ļ¼šē”®äæéŸ³é¢‘ęø…ę™°åŗ¦č‰Æå„½ļ¼Œę ‡ę³Øę–‡ęœ¬å‡†ē”®ļ¼Œéæå…å™ŖéŸ³å¹²ę‰°å½±å“č®­ē»ƒę•ˆęžœć€‚

ę•°ę®ę ¼å¼č§„čŒƒ

FunASRę”†ęž¶č¦ę±‚ę•°ę®éµå¾Ŗē‰¹å®šę ¼å¼ļ¼š

ę–‡ęœ¬ę ‡ę³Øę–‡ä»¶ (train_text.txt)

1
2
3
A0001 čæ™ę˜ÆéŸ³é¢‘č½¬å†™ēš„ę–‡ęœ¬å†…å®¹
A0002 čæ™ę˜Æęµ‹čÆ•å†…å®¹
A0003 å›½čˆŖ4341äø‹åˆ°15äæęŒåÆä»„ē›²é™čæ›čæ‘č·‘é“19å»ŗē«‹čˆŖé“ęŠ„

éŸ³é¢‘č·Æå¾„ę–‡ä»¶ (train_wav.scp)

1
2
3
A0001 ./train/A0001.wav
A0002 ./train/A0002.wav
A0003 ./train/A0003.wav

ę ¼å¼č¦ę±‚ļ¼šéŸ³é¢‘IDå’Œę–‡ä»¶č·Æå¾„ä¹‹é—“ē”Øē©ŗę ¼åˆ†éš”ļ¼Œē”®äæIDåœØäø¤äøŖę–‡ä»¶äø­å®Œå…Øäø€č‡“ć€‚

ę•°ę®é¢„å¤„ē†č„šęœ¬

ē”±äŗŽåŽŸå§‹ę•°ę®åˆ†åøƒåœØå¤šäøŖå­ē›®å½•äø­ļ¼Œéœ€č¦čæ›č”Œē»Ÿäø€ę•“ē†ć€‚ä»„äø‹ä»£ē å°†åˆ†ę•£ēš„éŸ³é¢‘ę–‡ä»¶åˆå¹¶åˆ°ē»Ÿäø€ē›®å½•ļ¼š

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import os
import shutil

def rename_files_in_directory(source_dir, target_dir, new_filename_prefix):
"""
å°†ęŗē›®å½•äø‹ēš„éŸ³é¢‘ę–‡ä»¶å¤åˆ¶åˆ°ē›®ę ‡ē›®å½•ļ¼Œå¹¶ę·»åŠ å‰ē¼€ä»„éæå…ę–‡ä»¶åå†²ēŖ

Args:
source_dir: ęŗéŸ³é¢‘ę–‡ä»¶ē›®å½•
target_dir: ē›®ę ‡ē»Ÿäø€ē›®å½•
new_filename_prefix: ę–‡ä»¶åå‰ē¼€ļ¼ˆē”ØäŗŽåŒŗåˆ†äøåŒå­ē›®å½•ļ¼‰
"""
# åˆ›å»ŗē›®ę ‡ē›®å½•
os.makedirs(target_dir, exist_ok=True)

# å¤„ē†ē›®å½•äø‹ēš„ę‰€ęœ‰éŸ³é¢‘ę–‡ä»¶
for filename in os.listdir(source_dir):
if filename.endswith(('.wav', '.mp3', '.m4a')):
# ē”Ÿęˆåø¦å‰ē¼€ēš„ę–°ę–‡ä»¶å
new_filename = f"{new_filename_prefix}{filename}"
old_file_path = os.path.join(source_dir, filename)
new_file_path = os.path.join(target_dir, new_filename)

# å¤åˆ¶ę–‡ä»¶åˆ°ē›®ę ‡ē›®å½•
if os.path.isfile(old_file_path):
shutil.copy2(old_file_path, new_file_path)
print(f"已处理: {old_file_path} -> {new_file_path}")

def add_prefix_to_file(source_file, target_file, line_prefix):
"""
äøŗę ‡ę³Øę–‡ä»¶ēš„ęÆäø€č”Œę·»åŠ å‰ē¼€ļ¼Œē”®äæéŸ³é¢‘IDēš„äø€č‡“ę€§

Args:
source_file: 源标注文件
target_file: ē›®ę ‡åˆå¹¶ę–‡ä»¶
line_prefix: č”Œå‰ē¼€ļ¼ˆäøŽéŸ³é¢‘ę–‡ä»¶å‰ē¼€åÆ¹åŗ”ļ¼‰
"""
try:
with open(source_file, 'r', encoding='utf-8') as file:
lines = file.readlines()

# äøŗęÆč”Œę·»åŠ å‰ē¼€
modified_lines = [f"{line_prefix}{line.strip()}" for line in lines]

# čæ½åŠ åˆ°ē›®ę ‡ę–‡ä»¶
with open(target_file, 'a', encoding='utf-8') as file:
file.writelines('\n'.join(modified_lines) + '\n')

print(f"å‰ē¼€ '{line_prefix}' 已添加到 {target_file}")
except FileNotFoundError:
print(f"错误: ę‰¾äøåˆ°ę–‡ä»¶ '{source_file}'")
except Exception as e:
print(f"处理出错: {e}")

def split_text(input_file, output_file):
"""
å°†čæžē»­ę ¼å¼ēš„ę ‡ę³Øę–‡ęœ¬åˆ†å‰²äøŗIDå’Œå†…å®¹äø¤éƒØåˆ†
ę ¼å¼č½¬ę¢: "0001å›½čˆŖ4341äø‹åˆ°15äæęŒ" -> "0001 å›½čˆŖ4341äø‹åˆ°15äæęŒ"

Args:
input_file: č¾“å…„ēš„åŽŸå§‹ę ‡ę³Øę–‡ä»¶
output_file: č¾“å‡ŗēš„ę ‡å‡†ę ¼å¼ę–‡ä»¶
"""
with open(input_file, 'r', encoding='utf-8') as file:
content = file.readlines()

with open(output_file, 'w', encoding='utf-8') as output_file:
for line in content:
# å‡č®¾å‰4ä½ę˜ÆIDļ¼ŒåŽé¢ę˜Æę–‡ęœ¬å†…å®¹
if len(line.strip()) > 4:
prefix = line[:4] # 音频ID
suffix = line[4:].strip() # č½¬å†™ę–‡ęœ¬
output_file.write(f"{prefix} {suffix}\n")

# 使用示例
if __name__ == "__main__":
# 1. åˆ†å‰²ę ‡ę³Øę–‡ęœ¬ę ¼å¼
split_text(
r'D:\Works\Python\funasr\train\train_label\0001-1000.txt',
r'D:\Works\Python\funasr\train\output\txt\0001-1000.txt'
)

# 2. åˆå¹¶éŸ³é¢‘ę–‡ä»¶ļ¼ˆę·»åŠ å‰ē¼€A)
# rename_files_in_directory(
# r'D:\Works\Python\funasr\train\train_GZ\0001-1000',
# r'D:\Works\Python\funasr\train\output\train',
# 'A'
# )

# 3. åˆå¹¶ę ‡ę³Øę–‡ä»¶ļ¼ˆę·»åŠ å‰ē¼€A)
# add_prefix_to_file(
# r'D:\Works\Python\funasr\train\train_label\0001-1000.txt',
# r'D:\Works\Python\funasr\train\output\label\file.txt',
# 'A'
# )

šŸŽµ éŸ³é¢‘é‡‡ę ·ēŽ‡č½¬ę¢

åœØęœŗå™Øå­¦ä¹ č®­ē»ƒäø­ļ¼Œę•°ę®äø€č‡“ę€§č‡³å…³é‡č¦ć€‚å¦‚ęžœä½ ēš„é¢„č®­ē»ƒęØ”åž‹ę˜ÆåŸŗäŗŽ16kHzéŸ³é¢‘č®­ē»ƒēš„ļ¼Œé‚£ä¹ˆä½æē”Øē›øåŒé‡‡ę ·ēŽ‡ēš„ę•°ę®čæ›č”Œå¾®č°ƒčƒ½čŽ·å¾—ę›“å„½ēš„ę•ˆęžœć€‚ęœ¬é”¹ē›®åÆ¹ęÆ”äŗ†8kHz和16kHzäø¤ē§é‡‡ę ·ēŽ‡ēš„č®­ē»ƒę•ˆęžœļ¼Œ16kHzęØ”åž‹ēš„å­—é”™ēŽ‡ę˜Žę˜¾ę›“ä½Žć€‚

仄下代码使用FFmpegčæ›č”Œé«˜ę•ˆēš„éŸ³é¢‘é‡‡ę ·ēŽ‡č½¬ę¢ļ¼Œé‡‡ē”Øå¤ščæ›ēØ‹å¹¶č”Œå¤„ē†åŠ åæ«č½¬ę¢é€Ÿåŗ¦ļ¼š

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import os
import subprocess
from tqdm import tqdm
from multiprocessing import Pool

def convert_audio_file(args):
"""
å•äøŖéŸ³é¢‘ę–‡ä»¶č½¬ę¢å‡½ę•°

Args:
args: 包含(input_file, output_sample_rate, output_dir)ēš„å…ƒē»„

Returns:
str: č½¬ę¢åŽēš„ę–‡ä»¶å
"""
input_file, output_sample_rate, output_dir = args
output_file = os.path.join(
output_dir,
os.path.splitext(os.path.basename(input_file))[0] + ".wav"
)

# č°ƒē”ØFFmpegčæ›č”ŒéŸ³é¢‘č½¬ę¢
# -ar: č®¾ē½®éŸ³é¢‘é‡‡ę ·ēŽ‡
# -ac 1: č½¬ę¢äøŗå•å£°é“
# -y: 覆盖输出文件
subprocess.run([
"ffmpeg", "-i", input_file,
"-ar", str(output_sample_rate),
"-ac", "1",
"-y", output_file
], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

return os.path.basename(output_file)

def convert_audio_sample_rate(input_dir, output_sample_rate, output_dir, num_processes=12):
"""
ę‰¹é‡č½¬ę¢éŸ³é¢‘é‡‡ę ·ēŽ‡

Args:
input_dir: č¾“å…„éŸ³é¢‘ē›®å½•
output_sample_rate: ē›®ę ‡é‡‡ę ·ēŽ‡ļ¼ˆå»ŗč®®16000)
output_dir: 输出目录
num_processes: å¹¶č”Œčæ›ēØ‹ę•°ļ¼ˆę ¹ę®CPUę øåæƒę•°č°ƒę•“ļ¼‰
"""
# åˆ›å»ŗč¾“å‡ŗē›®å½•
os.makedirs(output_dir, exist_ok=True)

# čŽ·å–ę‰€ęœ‰éŸ³é¢‘ę–‡ä»¶
supported_formats = ('.wav', '.mp3', '.m4a', '.flac')
files_to_convert = [
os.path.join(input_dir, f)
for f in os.listdir(input_dir)
if f.lower().endswith(supported_formats)
]

print(f"å‘ēŽ° {len(files_to_convert)} äøŖéŸ³é¢‘ę–‡ä»¶ļ¼Œå¼€å§‹č½¬ę¢...")

# åˆ›å»ŗčæ›åŗ¦ę”
progress_bar = tqdm(
total=len(files_to_convert),
unit="files",
desc=f"č½¬ę¢åˆ°{output_sample_rate}Hz"
)

# ä½æē”Øå¤ščæ›ēØ‹å¹¶č”Œå¤„ē†
with Pool(processes=num_processes) as pool:
args_list = [(f, output_sample_rate, output_dir) for f in files_to_convert]
for result in pool.imap(convert_audio_file, args_list):
if result:
progress_bar.update(1)

progress_bar.close()
print(f"āœ… éŸ³é¢‘č½¬ę¢å®Œęˆļ¼č½¬ę¢åŽēš„ę–‡ä»¶äæå­˜åœØ: {output_dir}")

# 使用示例
if __name__ == '__main__':
input_dir = r"D:\Works\Python\funasr\train\train_GZ\0001-1000"
output_dir = r"D:\Works\test\16k_audio"
target_sr = 16000 # ęŽØčä½æē”Ø16kHz

convert_audio_sample_rate(
input_dir=input_dir,
output_sample_rate=target_sr,
output_dir=output_dir,
num_processes=8 # ę ¹ę®ä½ ēš„CPUę øåæƒę•°č°ƒę•“
)

ę€§čƒ½ä¼˜åŒ–å»ŗč®®ļ¼š

  • 进程数建议设置为CPUę øåæƒę•°ēš„70-80%
  • ē”®äæęœ‰č¶³å¤Ÿēš„ē£ē›˜ē©ŗé—“å­˜å‚Øč½¬ę¢åŽēš„éŸ³é¢‘
  • 16kHzé‡‡ę ·ēŽ‡åœØčÆ†åˆ«ē²¾åŗ¦å’Œę–‡ä»¶å¤§å°ä¹‹é—“ęä¾›äŗ†č‰Æå„½ēš„å¹³č””

ę•°ę®é›†ę•“ē†å®ŒęˆåŽļ¼š

  • ę‰€ęœ‰č®­ē»ƒéŸ³é¢‘ē»Ÿäø€äæå­˜åœØäø€äøŖē›®å½•äø­
  • ę‰€ęœ‰ę ‡ē­¾ę•°ę®åˆå¹¶åˆ°ē»Ÿäø€ēš„ę–‡ęœ¬ę–‡ä»¶äø­
  • ęŒ‰ē…§9:1ęÆ”ä¾‹åˆ’åˆ†č®­ē»ƒé›†å’ŒéŖŒčÆé›†
  • éŸ³é¢‘é‡‡ę ·ēŽ‡ē»Ÿäø€č½¬ę¢äøŗ16kHzļ¼ˆęŽØčļ¼‰

šŸš€ ęØ”åž‹č®­ē»ƒ

甬件要求

åœØå¼€å§‹č®­ē»ƒä¹‹å‰ļ¼ŒčÆ·ē”®äæä½ ēš„ē”¬ä»¶é…ē½®ę»”č¶³ä»„äø‹č¦ę±‚ļ¼š

é…ē½®é”¹ē›® ęœ€ä½Žč¦ę±‚ ęŽØčé…ē½® ęœ¬ę¬”å®žéŖŒ
显存 12GB 24GB+ 48GB (2ƗRTX4090)
内存 32GB 64GB+ 64GB
å­˜å‚Øē©ŗé—“ 50GB 100GB+ 200GB
GPUꕰ量 1块 2块+ 2块

č®­ē»ƒę—¶é—“é¢„ä¼°ļ¼šę ¹ę®ę•°ę®é›„å’Œē”¬ä»¶é…ē½®ļ¼Œå®Œę•“č®­ē»ƒčæ‡ēØ‹åÆčƒ½éœ€č¦ę•°å°ę—¶åˆ°ę•°åå°ę—¶äøē­‰ć€‚

č®­ē»ƒč„šęœ¬é…ē½®

FunASRęä¾›äŗ†å®Œę•“ēš„č®­ē»ƒč„šęœ¬ęØ”ęæć€‚ē¼–č¾‘ FunASR/examples/industrial_data_pretraining/paraformer_streaming/finetune.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#!/bin/bash
# Copyright FunASR (https://github.com/alibaba-damo-academy/FunASR). All Rights Reserved.
# MIT License (https://opensource.org/licenses/MIT)

# =============================================================================
# GPUé…ē½®éƒØåˆ†
# =============================================================================
# ęŒ‡å®šä½æē”Øēš„GPUē¼–å·ļ¼ˆä»Ž0开始)
export CUDA_VISIBLE_DEVICES="0,1" # 单GPU使用: "0"
gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')

# =============================================================================
# ęØ”åž‹é…ē½®éƒØåˆ†
# =============================================================================
# 选锹1ļ¼šä½æē”ØModelScopeč‡ŖåŠØäø‹č½½ęØ”åž‹
model_name_or_model_dir="iic/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online"

# 选锹2ļ¼šä½æē”Øęœ¬åœ°å·²äø‹č½½ēš„ęØ”åž‹ļ¼ˆęŽØčļ¼‰
# model_name_or_model_dir="/path/to/your/local/model"

# 选锹3ļ¼šä½æē”ØGitäø‹č½½ęØ”åž‹
# local_path_root=${workspace}/modelscope_models
# mkdir -p ${local_path_root}/${model_name_or_model_dir}
# git clone https://www.modelscope.cn/${model_name_or_model_dir}.git ${local_path_root}/${model_name_or_model_dir}
# model_name_or_model_dir=${local_path_root}/${model_name_or_model_dir}

# =============================================================================
# ę•°ę®é…ē½®éƒØåˆ†
# =============================================================================
# ę•°ę®ę ¹ē›®å½•ļ¼ˆåŒ…å«train和val相关文件)
data_dir="../../../data/list"

# JSONLę ¼å¼ę•°ę®ę–‡ä»¶ļ¼ˆč‡ŖåŠØē”Ÿęˆļ¼‰
train_data="${data_dir}/train.jsonl"
val_data="${data_dir}/val.jsonl"

# ē”Ÿęˆč®­ē»ƒę•°ę®é›†JSONLꖇ件
echo "šŸ“ ę­£åœØē”Ÿęˆč®­ē»ƒę•°ę®é›†..."
scp2jsonl \
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
++data_type_list='["source", "target"]' \
++jsonl_file_out="${train_data}"

# ē”ŸęˆéŖŒčÆę•°ę®é›†JSONLꖇ件
echo "šŸ“ ę­£åœØē”ŸęˆéŖŒčÆę•°ę®é›†..."
scp2jsonl \
++scp_file_list='["../../../data/list/val_wav.scp", "../../../data/list/val_text.txt"]' \
++data_type_list='["source", "target"]' \
++jsonl_file_out="${val_data}"

# =============================================================================
# č¾“å‡ŗé…ē½®éƒØåˆ†
# =============================================================================
output_dir="./outputs"
log_file="${output_dir}/log.txt"
mkdir -p ${output_dir}

echo "šŸ“ č®­ē»ƒę—„åæ—å°†äæå­˜åœØ: ${log_file}"

# =============================================================================
# č®­ē»ƒå‚ę•°é…ē½®
# =============================================================================
# 使用torchrunåÆåŠØåˆ†åøƒå¼č®­ē»ƒ
echo "šŸš€ å¼€å§‹ęØ”åž‹č®­ē»ƒ..."
torchrun \
--nnodes 1 \
--nproc_per_node ${gpu_num} \
../../../funasr/bin/train.py \
++model="${model_name_or_model_dir}" \
++train_data_set_list="${train_data}" \
++valid_data_set_list="${val_data}" \
++dataset_conf.batch_size=20000 \
++dataset_conf.batch_type="token" \
++dataset_conf.num_workers=4 \
++train_conf.max_epoch=50 \
++train_conf.log_interval=1 \
++train_conf.resume=false \
++train_conf.validate_interval=2000 \
++train_conf.save_checkpoint_interval=2000 \
++train_conf.keep_nbest_models=20 \
++optim_conf.lr=0.0002 \
++output_dir="${output_dir}" &> ${log_file}

echo "āœ… č®­ē»ƒå®Œęˆļ¼ęØ”åž‹äæå­˜åœØ: ${output_dir}"

šŸ› ļø č®­ē»ƒå‚ę•°čÆ¦č§£

äø‹č”ØčÆ¦ē»†č§£é‡Šäŗ†å„äøŖč®­ē»ƒå‚ę•°ēš„å«ä¹‰å’ŒęŽØčč®¾ē½®ļ¼š

å‚ę•° é»˜č®¤å€¼ ęŽØččŒƒå›“ čÆ“ę˜Ž
ę•°ę®é›†å‚ę•°
batch_size 20000 10000-30000 ęÆäøŖbatchēš„tokenę•°é‡ļ¼ŒåÆę ¹ę®ę˜¾å­˜č°ƒę•“
batch_type ā€œtokenā€ ā€œtokenā€/ā€œlengthā€ ę‰¹å¤„ē†ē±»åž‹ļ¼Œtokenē±»åž‹ę›“ēØ³å®š
num_workers 4 2-8 ę•°ę®åŠ č½½ēŗæēØ‹ę•°ļ¼Œę ¹ę®CPUę øåæƒę•°č®¾å®š
č®­ē»ƒå‚ę•°
max_epoch 50 20-200 ęœ€å¤§č®­ē»ƒč½®ę•°ļ¼Œé˜²ę­¢čæ‡ę‹Ÿåˆ
log_interval 1 1-10 ę—„åæ—č¾“å‡ŗé—“éš”ļ¼ˆę­„ę•°ļ¼‰
validate_interval 2000 1000-5000 éŖŒčÆé›†čÆ„ä¼°é—“éš”ļ¼ˆę­„ę•°ļ¼‰
save_checkpoint_interval 2000 1000-5000 ęØ”åž‹äæå­˜é—“éš”ļ¼ˆę­„ę•°ļ¼‰
keep_nbest_models 20 5-50 äæē•™ęœ€ä¼˜ęØ”åž‹ę•°é‡
ä¼˜åŒ–å™Øå‚ę•°
lr 0.0002 0.0001-0.001 åˆå§‹å­¦ä¹ ēŽ‡ļ¼Œå¾®č°ƒę—¶č®¾ē½®č¾ƒå°

å‚ę•°č°ƒä¼˜å»ŗč®®ļ¼š

  1. batch_sizeļ¼šę ¹ę®ę˜¾å­˜å¤§å°č°ƒę•“ļ¼Œę˜¾å­˜č¶Šå¤§åÆč®¾ē½®č¶Šå¤§ēš„batch_size
  2. max_epochļ¼šåˆę¬”č®­ē»ƒå»ŗč®®č®¾ē½®äøŗ50-100ļ¼Œč§‚åÆŸę”¶ę•›ęƒ…å†µå†č°ƒę•“
  3. å­¦ä¹ ēŽ‡ļ¼šå¾®č°ƒé¢„č®­ē»ƒęØ”åž‹ę—¶ļ¼Œå­¦ä¹ ēŽ‡äøå®œčæ‡å¤§ļ¼Œé˜²ę­¢ē “ååŽŸęœ‰ē‰¹å¾

šŸ“ ę‰§č”Œč®­ē»ƒ

åœØé…ē½®å„½č®­ē»ƒč„šęœ¬åŽļ¼ŒęŒ‰ē…§ä»„äø‹ę­„éŖ¤ę‰§č”Œč®­ē»ƒļ¼š

1. åŽå°čæč”Œč®­ē»ƒ

1
2
3
4
5
# 使用nohupåœØåŽå°čæč”Œļ¼Œé˜²ę­¢ē»ˆē«Æę–­å¼€å½±å“č®­ē»ƒ
nohup bash finetune.sh > train.log 2>&1 &

# ęŸ„ēœ‹čæ›ēØ‹ēŠ¶ę€
ps aux | grep train.py

2. ē›‘ęŽ§č®­ē»ƒčæ›åŗ¦

1
2
3
4
5
# å®žę—¶ęŸ„ēœ‹č®­ē»ƒę—„åæ—
tail -f train.log

# ęˆ–č€…ęŸ„ēœ‹č¾“å‡ŗē›®å½•äø­ēš„ę—„åæ—
tail -f ./outputs/log.txt

3. 启动TensorBoardē›‘ęŽ§

1
2
3
4
5
6
7
8
# å…³é—­é»˜č®¤ēš„TensorBoardļ¼ˆå¦‚ęžœęœ‰ļ¼‰
pkill -f tensorboard

# åÆåŠØę–°ēš„TensorBoardå¹¶ęŒ‡å®šę—„åæ—ē›®å½•
nohup tensorboard --port 6007 --logdir ./outputs/tensorboard > tensorboard.log 2>&1 &

# åœØęµč§ˆå™Øäø­č®æé—®
echo "šŸ“Š TensorBoard地址: http://localhost:6007"

č®­ē»ƒå®Œęˆę ‡åæ—ļ¼š

  • č¾“å‡ŗē›®å½•äø­ē”Ÿęˆ model.pt ꖇ件
  • éŖŒčÆé›†äøŠēš„ęŸå¤±å€¼č¶‹äŗŽēØ³å®š
  • TensorBoardäø­ę˜¾ē¤ŗęø…ę™°ēš„ę”¶ę•›č¶‹åŠæ

šŸ“Š č®­ē»ƒčæ›åŗ¦ē›‘ęŽ§

č®­ē»ƒčæ‡ēØ‹äø­éœ€č¦å…³ę³Øä»„äø‹ęŒ‡ę ‡ļ¼š

graph TD
    A[č®­ē»ƒå¼€å§‹] --> B[ē›‘ęŽ§Loss曲线]
    B --> C[ę£€ęŸ„éŖŒčÆé›†č”ØēŽ°]
    C --> D{ę˜Æå¦čæ‡ę‹Ÿåˆ?}
    D -->|Yes| E[é™ä½Žå­¦ä¹ ēŽ‡ęˆ–ę—©åœ]
    D -->|No| F{收敛了吗?}
    F -->|Yes| G[训练完成]
    F -->|No| H[继续训练]
    H --> B
    E --> B

å…³é”®ęŒ‡ę ‡čÆ“ę˜Žļ¼š

  • č®­ē»ƒęŸå¤± (Train Loss)ļ¼šåŗ”čÆ„ęŒē»­äø‹é™
  • éŖŒčÆęŸå¤± (Valid Loss)ļ¼šåŗ”čÆ„åŒę­„äø‹é™ļ¼Œå¦‚ęžœäøŠå‡åˆ™åÆčƒ½čæ‡ę‹Ÿåˆ
  • å­—é”™ēŽ‡ (CER)ļ¼ščÆ„ä¼°ęØ”åž‹åœØéŖŒčÆé›†äøŠēš„č”ØēŽ°

    šŸŽÆ ęØ”åž‹ęŽØē†

ęØ”åž‹éƒØē½²å‡†å¤‡

č®­ē»ƒå®ŒęˆåŽļ¼Œä½ éœ€č¦å°†č®­ē»ƒå„½ēš„ęØ”åž‹ę–‡ä»¶ę›æę¢åˆ°åŽŸå§‹ęØ”åž‹ē›®å½•äø­ļ¼š

1
2
3
4
5
6
7
8
# 1. å¤‡ä»½åŽŸå§‹ęØ”åž‹
cp /path/to/original/model.pt /path/to/original/model.pt.backup

# 2. ę›æę¢äøŗč®­ē»ƒå„½ēš„ęØ”åž‹
cp ./outputs/model.pt /path/to/original/model.pt

# 3. éŖŒčÆęØ”åž‹ę–‡ä»¶å®Œę•“ę€§
ls -la /path/to/original/model.pt

é‡č¦ęé†’ļ¼šåœØę›æę¢ęØ”åž‹ę–‡ä»¶ä¹‹å‰ļ¼ŒčÆ·åŠ”åæ…å¤‡ä»½åŽŸå§‹ęØ”åž‹ļ¼Œä»„é˜²ę„å¤–ęƒ…å†µć€‚

ęŽØē†ä»£ē å®žēŽ°

ä»„äø‹ę˜Æå®Œę•“ēš„ęŽØē†ä»£ē ļ¼Œę”ÆęŒå®žę—¶ęµå¼čÆ†åˆ«ļ¼š

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import argparse
import soundfile
import os
from typing import List
from funasr import AutoModel

def parse_arguments():
"""č§£ęžå‘½ä»¤č”Œå‚ę•°"""
parser = argparse.ArgumentParser(description='å®žę—¶čÆ­éŸ³čÆ†åˆ«ParaformeręØ”åž‹ęŽØē†')
parser.add_argument(
"--asr_model_online_revision",
type=str,
default="v2.0.4",
help="ęØ”åž‹ē‰ˆęœ¬å·"
)
parser.add_argument(
"--asr_model_online",
type=str,
default=r"C:\Users\21316\.cache\modelscope\hub\iic\speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online",
help="ęØ”åž‹č·Æå¾„ļ¼ˆęœ¬åœ°ęˆ–ModelScope)"
)
parser.add_argument(
"--ngpu",
type=int,
default=1,
help="GPUę•°é‡ļ¼ˆ0=CPU, 1=GPU)"
)
parser.add_argument(
"--device",
type=str,
default="cuda",
help="č®”ē®—č®¾å¤‡ļ¼ˆcuda/cpu)"
)
parser.add_argument(
"--ncpu",
type=int,
default=4,
help="CPUę øåæƒę•°"
)
return parser.parse_args()

def load_model(args):
"""åŠ č½½čÆ­éŸ³čÆ†åˆ«ęØ”åž‹"""
print(f"šŸš€ ę­£åœØåŠ č½½ęØ”åž‹: {args.asr_model_online}")

model = AutoModel(
model=args.asr_model_online,
model_revision=args.asr_model_online_revision,
ngpu=args.ngpu,
ncpu=args.ncpu,
device=args.device,
disable_pbar=True, # ē¦ē”Øčæ›åŗ¦ę”
disable_log=True, # 禁用旄志
disable_update=True # ē¦ē”Øč‡ŖåŠØę›“ę–°
)

print(f"āœ… ęØ”åž‹åŠ č½½ęˆåŠŸļ¼")
return model

def infer_batch(model, wav_file_dir: str) -> List[str]:
"""
ę‰¹é‡ęŽØē†éŸ³é¢‘ę–‡ä»¶

Args:
model: åŠ č½½å„½ēš„ASRęØ”åž‹
wav_file_dir: éŸ³é¢‘ę–‡ä»¶ē›®å½•č·Æå¾„

Returns:
List[str]: čÆ†åˆ«ē»“ęžœåˆ—č”Ø
"""
final_result_list = []

# ęµå¼čÆ†åˆ«å‚ę•°é…ē½®
chunk_size = [0, 10, 5] # [0, 10, 5] = 600ms, [0, 8, 4] = 480ms
encoder_chunk_look_back = 4 # ē¼–ē å™Øå›žęœ›å—ę•°
decoder_chunk_look_back = 1 # č§£ē å™Øå›žęœ›å—ę•°

if not os.path.isdir(wav_file_dir):
print(f"āŒ 错误: {wav_file_dir} äøę˜Æęœ‰ę•ˆē›®å½•")
return final_result_list

# čŽ·å–ę‰€ęœ‰éŸ³é¢‘ę–‡ä»¶
audio_files = []
for root, dirs, files in os.walk(wav_file_dir):
for file in files:
if file.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
audio_files.append(os.path.join(root, file))

print(f"šŸ“ å‘ēŽ° {len(audio_files)} äøŖéŸ³é¢‘ę–‡ä»¶ļ¼Œå¼€å§‹ęŽØē†...")

for i, wav_file in enumerate(audio_files, 1):
try:
print(f"šŸŽ§ [{i}/{len(audio_files)}] ę­£åœØå¤„ē†: {os.path.basename(wav_file)}")

# åŠ č½½éŸ³é¢‘ę–‡ä»¶
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms åÆ¹åŗ”ēš„é‡‡ę ·ē‚¹ę•°

# ęµå¼čÆ†åˆ«å¤„ē†
res_txt = []
cache = {} # ē¼“å­˜äøŠäø‹ę–‡äæ”ęÆ
total_chunk_num = int((len(speech) - 1) / chunk_stride + 1)

for chunk_idx in range(total_chunk_num):
# ęå–å½“å‰å—ēš„éŸ³é¢‘ę•°ę®
start_idx = chunk_idx * chunk_stride
end_idx = (chunk_idx + 1) * chunk_stride
speech_chunk = speech[start_idx:end_idx]

# åˆ¤ę–­ę˜Æå¦äøŗęœ€åŽäø€å—
is_final = chunk_idx == total_chunk_num - 1

# ę‰§č”ŒčÆ†åˆ«
res = model.generate(
input=speech_chunk,
cache=cache,
is_final=is_final,
chunk_size=chunk_size,
encoder_chunk_look_back=encoder_chunk_look_back,
decoder_chunk_look_back=decoder_chunk_look_back
)

# ę”¶é›†čÆ†åˆ«ē»“ęžœ
if res and len(res) > 0 and 'text' in res[0]:
res_txt.append(res[0]['text'])

# åˆå¹¶ę‰€ęœ‰å—ēš„čÆ†åˆ«ē»“ęžœ
final_res = ''.join(res_txt)
print(f"āœ… čÆ†åˆ«ē»“ęžœ: {final_res}")
final_result_list.append(final_res)

except Exception as e:
print(f"āŒ 处理 {wav_file} 时出错: {e}")
final_result_list.append("") # ę·»åŠ ē©ŗē»“ęžœäæęŒē“¢å¼•äø€č‡“

print(f"āœ… ę‰¹é‡ęŽØē†å®Œęˆļ¼å…±å¤„ē† {len(audio_files)} 个文件")
return final_result_list

def main():
"""主函数"""
# č§£ęžå‚ę•°
args = parse_arguments()

# åŠ č½½ęØ”åž‹
model = load_model(args)

# ę‰§č”ŒęŽØē†ļ¼ˆčÆ·äæ®ę”¹äøŗä½ ēš„éŸ³é¢‘ē›®å½•č·Æå¾„ļ¼‰
audio_dir = r"D:\path\to\your\audio\files"
results = infer_batch(model, audio_dir)

# č¾“å‡ŗē»“ęžœ
print(f"\nšŸ“ˆ ęŽØē†ē»“ęžœę±‡ę€»:")
for i, result in enumerate(results, 1):
print(f"{i:3d}: {result}")

if __name__ == "__main__":
main()

šŸ“ ę”ÆęŒå·„å…·ę–‡ä»¶

čÆ„ä¼°čæ‡ēØ‹äø­éœ€č¦ēš„ę–‡ęœ¬é¢„å¤„ē†å·„å…·ļ¼š

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from utils.basic import BasicTextNormalizer

def normalize_texts_chinese(texts):
"""äø­ę–‡ę–‡ęœ¬ę ‡å‡†åŒ–"""
normalizer = BasicTextNormalizer(remove_diacritics=False, split_letters=True)
return [normalizer(text) for text in texts]

def normalize_texts_japanese_korean(texts):
"""ę—„ę–‡/éŸ©ę–‡ę–‡ęœ¬ę ‡å‡†åŒ–"""
normalizer = BasicTextNormalizer(remove_diacritics=False, split_letters=True)
return [normalizer(text) for text in texts]

def normalize_texts_english(texts):
"""č‹±ę–‡ę–‡ęœ¬ę ‡å‡†åŒ–"""
normalizer = BasicTextNormalizer(remove_diacritics=True, split_letters=False)
return [normalizer(text) for text in texts]

def normalize_texts_multi_language(texts):
"""å¤ščÆ­čØ€ę–‡ęœ¬ę ‡å‡†åŒ–"""
normalizer = BasicTextNormalizer(remove_diacritics=True, split_letters=False)
return [normalizer(text) for text in texts]

šŸŽÆ čÆ„ä¼°ē»“ęžœč§£čÆ»

čÆ„ä¼°å®ŒęˆåŽļ¼Œä½ å°†å¾—åˆ°ä»„äø‹å…³é”®äæ”ęÆļ¼š

ꌇꠇ ä¼˜ē§€ 良儽 äø€čˆ¬ éœ€č¦ę”¹čæ›
äø­ę–‡CER < 5% 5-10% 10-20% > 20%
英ꖇWER < 10% 10-15% 15-25% > 25%

čÆ„ä¼°ęœ€ä½³å®žč·µļ¼š

  1. å¤šę ·åŒ–ęµ‹čÆ•é›†: ē”®äæęµ‹čÆ•ę•°ę®č¦†ē›–å„ē§åœŗę™Æå’ŒčÆ“čÆäŗŗ
  2. åÆ¹ęÆ”åŸŗēŗæ: äøŽåŽŸå§‹ęœŖč®­ē»ƒęØ”åž‹čæ›č”ŒåÆ¹ęÆ”ļ¼ŒéŖŒčÆč®­ē»ƒę•ˆęžœ
  3. é”™čÆÆåˆ†ęž: ę·±å…„åˆ†ęžé”™čÆÆē±»åž‹ļ¼ŒęŒ‡åÆ¼åŽē»­ä¼˜åŒ–ę–¹å‘
  4. ęŒē»­ē›‘ęŽ§: å®šęœŸåœØę–°ę•°ę®äøŠčÆ„ä¼°ęØ”åž‹ę€§čƒ½

import re

šŸŽÆ ę€»ē»“äøŽå±•ęœ›

šŸ“‹ 锹目总结

é€ščæ‡ęœ¬ę–‡ēš„å®Œę•“ęµēØ‹ļ¼Œęˆ‘ä»¬ęˆåŠŸå®žēŽ°äŗ†åŸŗäŗŽFunASRę”†ęž¶ēš„ParaformerčÆ­éŸ³čÆ†åˆ«ęØ”åž‹é¢†åŸŸē‰¹åŒ–č®­ē»ƒć€‚é”¹ē›®ēš„äø»č¦ęˆęžœåŒ…ę‹¬ļ¼š

ęŠ€ęœÆå®žēŽ°ę–¹é¢ļ¼š

  • āœ… å®Œęˆäŗ†34,090ę”čˆŖē©ŗé¢†åŸŸčÆ­éŸ³ę•°ę®ēš„é¢„å¤„ē†å’Œę ¼å¼č½¬ę¢
  • āœ… ęˆåŠŸč®­ē»ƒäŗ†é€‚ē”ØäŗŽčˆŖē©ŗé€šäæ”åœŗę™Æēš„äø“ē”ØčÆ­éŸ³čÆ†åˆ«ęØ”åž‹
  • āœ… å®žēŽ°äŗ†é«˜ę•ˆēš„å®žę—¶ęµå¼čÆ­éŸ³čÆ†åˆ«ęŽØē†ē³»ē»Ÿ
  • āœ… å»ŗē«‹äŗ†å®Œę•“ēš„ęØ”åž‹čÆ„ä¼°å’Œę€§čƒ½ē›‘ęŽ§ä½“ē³»

ę€§čƒ½ęå‡ę–¹é¢ļ¼š

  • šŸ“ˆ ē›øęÆ”é€šē”ØęØ”åž‹ļ¼ŒåœØčˆŖē©ŗé¢†åŸŸēš„å­—é”™ēŽ‡ę˜¾č‘—é™ä½Ž
  • šŸ“ˆ äø“äøšęœÆčÆ­čÆ†åˆ«å‡†ē”®ēŽ‡å¤§å¹…ęå‡
  • šŸ“ˆ ę•°å­—čÆ»éŸ³å’Œäø“äøšåč®®čÆ†åˆ«ę›“åŠ å‡†ē”®

šŸš€ ęœŖę„ę”¹čæ›ę–¹å‘

  1. ę•°ę®å¢žå¼ŗ

    • ę”¶é›†ę›“å¤šę ·åŒ–ēš„čˆŖē©ŗé€šäæ”ę•°ę®
    • å¼•å…„ę•°ę®å¢žå¼ŗęŠ€ęœÆļ¼ˆé€Ÿåŗ¦ę‰°åŠØć€å™ŖéŸ³ę·»åŠ ē­‰ļ¼‰
    • å¹³č””äøåŒåœŗę™Æå’ŒčÆ“čÆäŗŗēš„ę•°ę®åˆ†åøƒ
  2. ęØ”åž‹ä¼˜åŒ–

    • å°čÆ•ę›“å¤§č§„ęØ”ēš„é¢„č®­ē»ƒęØ”åž‹
    • å®žéŖŒäøåŒēš„å­¦ä¹ ēŽ‡č°ƒåŗ¦ē­–ē•„
    • ęŽ¢ē“¢ēŸ„čÆ†č’øé¦ē­‰ęØ”åž‹åŽ‹ē¼©ęŠ€ęœÆ
  3. ē³»ē»Ÿé›†ęˆ

    • å¼€å‘å®žę—¶čÆ­éŸ³čÆ†åˆ«APIęœåŠ”
    • é›†ęˆčÆ­éŸ³ē«Æē‚¹ę£€ęµ‹(VAD)功能
    • ęž„å»ŗå®Œę•“ēš„čÆ­éŸ³å¤„ē†pipeline
  4. å¤šé¢†åŸŸę‰©å±•

    • ę‰©å±•åˆ°åŒ»ē–—ć€é‡‘čžē­‰å…¶ä»–äø“äøšé¢†åŸŸ
    • ę”ÆęŒå¤ščÆ­čØ€ę··åˆčÆ†åˆ«
    • å¼€å‘é¢†åŸŸč‡Ŗé€‚åŗ”ēš„åœØēŗæå­¦ä¹ ęœŗåˆ¶

šŸ’” ē»éŖŒę€»ē»“

成功要瓠:

  • šŸ“Š é«˜č“Øé‡ę•°ę®ļ¼šē”®äæę ‡ę³Øå‡†ē”®ę€§å’ŒéŸ³é¢‘ęø…ę™°åŗ¦
  • āš™ļø åˆē†å‚ę•°č®¾ē½®ļ¼šę ¹ę®ē”¬ä»¶čµ„ęŗå’Œę•°ę®ē‰¹ē‚¹č°ƒä¼˜
  • šŸ” ęŒē»­ē›‘ęŽ§ļ¼šå®žę—¶č·ŸčøŖč®­ē»ƒčæ›åŗ¦å’ŒęØ”åž‹ę€§čƒ½
  • šŸŽÆ å……åˆ†čÆ„ä¼°ļ¼šå¤šē»“åŗ¦éŖŒčÆęØ”åž‹ę•ˆęžœ

åøøč§ęŒ‘ęˆ˜ļ¼š

  • šŸ”§ ę•°ę®äøå¹³č””ļ¼šęŸäŗ›äø“äøšęœÆčÆ­ę ·ęœ¬čæ‡å°‘
  • šŸ’¾ čµ„ęŗé™åˆ¶ļ¼šę˜¾å­˜å’Œč®­ē»ƒę—¶é—“ēš„ęƒč””
  • šŸŽšļø č¶…å‚č°ƒä¼˜ļ¼šéœ€č¦å¤šę¬”å®žéŖŒę‰¾åˆ°ęœ€ä½³é…ē½®
  • šŸ“ˆ čæ‡ę‹Ÿåˆé£Žé™©ļ¼šå°ę•°ę®é›†å®¹ę˜“å‡ŗēŽ°čæ‡ę‹Ÿåˆ

å…³é”®å»ŗč®®ļ¼šåœØčæ›č”Œé¢†åŸŸē‰¹åŒ–č®­ē»ƒę—¶ļ¼ŒåŠ”åæ…äæęŒč€åæƒå’Œē³»ē»Ÿę€§ēš„å®žéŖŒę–¹ę³•ć€‚ęÆäø€ę¬”č°ƒę•“éƒ½č¦ęœ‰ę˜Žē”®ēš„å‡č®¾å’ŒéŖŒčÆęœŗåˆ¶ļ¼Œčæ™ę ·ę‰čƒ½ęœ€ē»ˆčŽ·å¾—ę»”ę„ēš„ē»“ęžœć€‚

šŸ“š å‚č€ƒčµ„ęŗ

å®˜ę–¹ę–‡ę”£

相关巄具

学习资源


šŸŽ‰ ę­å–œä½ å®Œęˆäŗ†čæ™äøŖęŒ‘ęˆ˜ę€§ēš„é”¹ē›®ļ¼

čÆ­éŸ³čÆ†åˆ«é¢†åŸŸē‰¹åŒ–č®­ē»ƒę˜Æäø€äøŖéœ€č¦č€åæƒå’ŒęŠ€å·§ēš„čæ‡ēØ‹ļ¼Œä½†é€ščæ‡ē³»ē»Ÿę€§ēš„ę–¹ę³•å’Œäøę–­ēš„å®žč·µļ¼Œä½ äø€å®ščƒ½å¤Ÿęž„å»ŗå‡ŗę»”č¶³ē‰¹å®šéœ€ę±‚ēš„é«˜č“Øé‡čÆ­éŸ³čÆ†åˆ«ē³»ē»Ÿć€‚åøŒęœ›čæ™ēÆ‡ę–‡ē« åÆ¹ä½ ēš„å­¦ä¹ å’Œå·„ä½œęœ‰ę‰€åø®åŠ©ļ¼

å¦‚ęžœä½ åœØå®žč·µčæ‡ēØ‹äø­é‡åˆ°ä»»ä½•é—®é¢˜ļ¼Œę¬¢čæŽåœØčÆ„č®ŗåŒŗäŗ¤ęµč®Øč®ŗć€‚č®©ęˆ‘ä»¬äø€čµ·ęŽØåŠØčÆ­éŸ³čÆ†åˆ«ęŠ€ęœÆēš„å‘å±•ļ¼

./utils/cer.py

Copyright 2021 The HuggingFace Evaluate Authors.

Licensed under the Apache License, Version 2.0 (the ā€œLicenseā€);

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an ā€œAS ISā€ BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

ā€œā€ā€ Character Error Ratio (CER) metric. ā€œā€ā€

from typing import List

import datasets
import jiwer
import jiwer.transforms as tr
from datasets.config import PY_VERSION
from packaging import version

import evaluate

if PY_VERSION < version.parse(ā€œ3.8ā€):
import importlib_metadata
else:
import importlib.metadata as importlib_metadata

SENTENCE_DELIMITER = ā€œā€

if version.parse(importlib_metadata.version(ā€œjiwerā€)) < version.parse(ā€œ2.3.0ā€):

class SentencesToListOfCharacters(tr.AbstractTransform):
    def __init__(self, sentence_delimiter: str = " "):
        self.sentence_delimiter = sentence_delimiter

    def process_string(self, s: str):
        return list(s)

    def process_list(self, inp: List[str]):
        chars = []
        for sent_idx, sentence in enumerate(inp):
            chars.extend(self.process_string(sentence))
            if self.sentence_delimiter is not None and self.sentence_delimiter != "" and sent_idx < len(inp) - 1:
                chars.append(self.sentence_delimiter)
        return chars

cer_transform = tr.Compose(
    [tr.RemoveMultipleSpaces(), tr.Strip(), SentencesToListOfCharacters(SENTENCE_DELIMITER)]
)

else:
cer_transform = tr.Compose(
[
tr.RemoveMultipleSpaces(),
tr.Strip(),
tr.ReduceToSingleSentence(SENTENCE_DELIMITER),
tr.ReduceToListOfListOfChars(),
]
)

_CITATION = ā€œā€ā€
@inproceedings{inproceedings,
author = {Morris, Andrew and Maier, Viktoria and Green, Phil},
year = {2004},
month = {01},
pages = {},
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.}
}
ā€œā€ā€

_DESCRIPTION = ā€œā€ā€
Character error rate (CER) is a common metric of the performance of an automatic speech recognition system.

CER is similar to Word Error Rate (WER), but operates on character instead of word. Please refer to docs of WER for further information.

Character error rate can be computed as:

CER = (S + D + I) / N = (S + D + I) / (S + D + C)

where

S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct characters,
N is the number of characters in the reference (N=S+D+C).

CER’s output is not always a number between 0 and 1, in particular when there is a high number of insertions. This value is often associated to the percentage of characters that were incorrectly predicted. The lower the value, the better the
performance of the ASR system with a CER of 0 being a perfect score.
ā€œā€ā€

_KWARGS_DESCRIPTION = ā€œā€ā€
Computes CER score of transcribed segments against references.
Args:
references: list of references for each speech input.
predictions: list of transcribtions to score.
concatenate_texts: Whether or not to concatenate sentences before evaluation, set to True for more accurate result.
Returns:
(float): the character error rate

Examples:

>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> cer = evaluate.load("cer")
>>> cer_score = cer.compute(predictions=predictions, references=references)
>>> print(cer_score)
0.34146341463414637

ā€œā€ā€

@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class CER(evaluate.Metric):
def _info(self):
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
ā€œpredictionsā€: datasets.Value(ā€œstringā€, id=ā€sequenceā€),
ā€œreferencesā€: datasets.Value(ā€œstringā€, id=ā€sequenceā€),
}
),
codebase_urls=[ā€œhttps://github.com/jitsi/jiwer/"],
reference_urls=[
ā€œhttps://en.wikipedia.org/wiki/Word_error_rate",
ā€œhttps://sites.google.com/site/textdigitisation/qualitymeasures/computingerrorrates",
],
)

def _compute(self, predictions, references, concatenate_texts=False):
    if concatenate_texts:
        return jiwer.compute_measures(
            references,
            predictions,
            truth_transform=cer_transform,
            hypothesis_transform=cer_transform,
        )["wer"]

    incorrect = 0
    total = 0
    for prediction, reference in zip(predictions, references):
        measures = jiwer.compute_measures(
            reference,
            prediction,
            truth_transform=cer_transform,
            hypothesis_transform=cer_transform,
        )
        incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"]
        total += measures["substitutions"] + measures["deletions"] + measures["hits"]

    return incorrect / total

./utils/wer.py

Copyright 2021 The HuggingFace Evaluate Authors.

Licensed under the Apache License, Version 2.0 (the ā€œLicenseā€);

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an ā€œAS ISā€ BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

ā€œā€ā€ Word Error Ratio (WER) metric. ā€œā€ā€

import datasets
from jiwer import compute_measures

import evaluate

_CITATION = ā€œā€ā€
@inproceedings{inproceedings,
author = {Morris, Andrew and Maier, Viktoria and Green, Phil},
year = {2004},
month = {01},
pages = {},
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.}
}
ā€œā€ā€

_DESCRIPTION = ā€œā€ā€
Word error rate (WER) is a common metric of the performance of an automatic speech recognition system.

The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort.

This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate.

Word error rate can then be computed as:

WER = (S + D + I) / N = (S + D + I) / (S + D + C)

where

S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct words,
N is the number of words in the reference (N=S+D+C).

This value indicates the average number of errors per reference word. The lower the value, the better the
performance of the ASR system with a WER of 0 being a perfect score.
ā€œā€ā€

_KWARGS_DESCRIPTION = ā€œā€ā€
Compute WER score of transcribed segments against references.

Args:
references: List of references for each speech input.
predictions: List of transcriptions to score.
concatenate_texts (bool, default=False): Whether to concatenate all input texts or compute WER iteratively.

Returns:
(float): the word error rate

Examples:

>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> wer = evaluate.load("wer")
>>> wer_score = wer.compute(predictions=predictions, references=references)
>>> print(wer_score)
0.5

ā€œā€ā€

@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class WER(evaluate.Metric):
def _info(self):
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
ā€œpredictionsā€: datasets.Value(ā€œstringā€, id=ā€sequenceā€),
ā€œreferencesā€: datasets.Value(ā€œstringā€, id=ā€sequenceā€),
}
),
codebase_urls=[ā€œhttps://github.com/jitsi/jiwer/"],
reference_urls=[
ā€œhttps://en.wikipedia.org/wiki/Word_error_rate",
],
)

def _compute(self, predictions=None, references=None, concatenate_texts=False):
    if concatenate_texts:
        return compute_measures(references, predictions)["wer"]
    else:
        incorrect = 0
        total = 0
        for prediction, reference in zip(predictions, references):
            measures = compute_measures(reference, prediction)
            incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"]
            total += measures["substitutions"] + measures["deletions"] + measures["hits"]
        return incorrect / total

./utils/english.py

import json
import os
import re
from fractions import Fraction
from typing import Iterator, List, Match, Optional, Union

from more_itertools import windowed

from .basic import remove_symbols_and_diacritics

class EnglishNumberNormalizer:
ā€œā€ā€
Convert any spelled-out numbers into arabic numbers, while handling:

- remove any commas
- keep the suffixes such as: `1960s`, `274th`, `32nd`, etc.
- spell out currency symbols after the number. e.g. `$20 million` -> `20000000 dollars`
- spell out `one` and `ones`
- interpret successive single-digit numbers as nominal: `one oh one` -> `101`
"""

def __init__(self):
    super().__init__()

    self.zeros = {"o", "oh", "zero"}
    self.ones = {
        name: i
        for i, name in enumerate(
            [
                "one",
                "two",
                "three",
                "four",
                "five",
                "six",
                "seven",
                "eight",
                "nine",
                "ten",
                "eleven",
                "twelve",
                "thirteen",
                "fourteen",
                "fifteen",
                "sixteen",
                "seventeen",
                "eighteen",
                "nineteen",
            ],
            start=1,
        )
    }
    self.ones_plural = {
        "sixes" if name == "six" else name + "s": (value, "s")
        for name, value in self.ones.items()
    }
    self.ones_ordinal = {
        "zeroth": (0, "th"),
        "first": (1, "st"),
        "second": (2, "nd"),
        "third": (3, "rd"),
        "fifth": (5, "th"),
        "twelfth": (12, "th"),
        **{
            name + ("h" if name.endswith("t") else "th"): (value, "th")
            for name, value in self.ones.items()
            if value > 3 and value != 5 and value != 12
        },
    }
    self.ones_suffixed = {**self.ones_plural, **self.ones_ordinal}

    self.tens = {
        "twenty": 20,
        "thirty": 30,
        "forty": 40,
        "fifty": 50,
        "sixty": 60,
        "seventy": 70,
        "eighty": 80,
        "ninety": 90,
    }
    self.tens_plural = {
        name.replace("y", "ies"): (value, "s") for name, value in self.tens.items()
    }
    self.tens_ordinal = {
        name.replace("y", "ieth"): (value, "th")
        for name, value in self.tens.items()
    }
    self.tens_suffixed = {**self.tens_plural, **self.tens_ordinal}

    self.multipliers = {
        "hundred": 100,
        "thousand": 1_000,
        "million": 1_000_000,
        "billion": 1_000_000_000,
        "trillion": 1_000_000_000_000,
        "quadrillion": 1_000_000_000_000_000,
        "quintillion": 1_000_000_000_000_000_000,
        "sextillion": 1_000_000_000_000_000_000_000,
        "septillion": 1_000_000_000_000_000_000_000_000,
        "octillion": 1_000_000_000_000_000_000_000_000_000,
        "nonillion": 1_000_000_000_000_000_000_000_000_000_000,
        "decillion": 1_000_000_000_000_000_000_000_000_000_000_000,
    }
    self.multipliers_plural = {
        name + "s": (value, "s") for name, value in self.multipliers.items()
    }
    self.multipliers_ordinal = {
        name + "th": (value, "th") for name, value in self.multipliers.items()
    }
    self.multipliers_suffixed = {
        **self.multipliers_plural,
        **self.multipliers_ordinal,
    }
    self.decimals = {*self.ones, *self.tens, *self.zeros}

    self.preceding_prefixers = {
        "minus": "-",
        "negative": "-",
        "plus": "+",
        "positive": "+",
    }
    self.following_prefixers = {
        "pound": "Ā£",
        "pounds": "Ā£",
        "euro": "€",
        "euros": "€",
        "dollar": "$",
        "dollars": "$",
        "cent": "Ā¢",
        "cents": "Ā¢",
    }
    self.prefixes = set(
        list(self.preceding_prefixers.values())
        + list(self.following_prefixers.values())
    )
    self.suffixers = {
        "per": {"cent": "%"},
        "percent": "%",
    }
    self.specials = {"and", "double", "triple", "point"}

    self.words = set(
        [
            key
            for mapping in [
                self.zeros,
                self.ones,
                self.ones_suffixed,
                self.tens,
                self.tens_suffixed,
                self.multipliers,
                self.multipliers_suffixed,
                self.preceding_prefixers,
                self.following_prefixers,
                self.suffixers,
                self.specials,
            ]
            for key in mapping
        ]
    )
    self.literal_words = {"one", "ones"}

def process_words(self, words: List[str]) -> Iterator[str]:
    prefix: Optional[str] = None
    value: Optional[Union[str, int]] = None
    skip = False

    def to_fraction(s: str):
        try:
            return Fraction(s)
        except ValueError:
            return None

    def output(result: Union[str, int]):
        nonlocal prefix, value
        result = str(result)
        if prefix is not None:
            result = prefix + result
        value = None
        prefix = None
        return result

    if len(words) == 0:
        return

    for prev, current, next in windowed([None] + words + [None], 3):
        if skip:
            skip = False
            continue

        next_is_numeric = next is not None and re.match(r"^\d+(\.\d+)?$", next)
        has_prefix = current[0] in self.prefixes
        current_without_prefix = current[1:] if has_prefix else current
        if re.match(r"^\d+(\.\d+)?$", current_without_prefix):
            # arabic numbers (potentially with signs and fractions)
            f = to_fraction(current_without_prefix)
            assert f is not None
            if value is not None:
                if isinstance(value, str) and value.endswith("."):
                    # concatenate decimals / ip address components
                    value = str(value) + str(current)
                    continue
                else:
                    yield output(value)

            prefix = current[0] if has_prefix else prefix
            if f.denominator == 1:
                value = f.numerator  # store integers as int
            else:
                value = current_without_prefix
        elif current not in self.words:
            # non-numeric words
            if value is not None:
                yield output(value)
            yield output(current)
        elif current in self.zeros:
            value = str(value or "") + "0"
        elif current in self.ones:
            ones = self.ones[current]

            if value is None:
                value = ones
            elif isinstance(value, str) or prev in self.ones:
                if (
                    prev in self.tens and ones < 10
                ):  # replace the last zero with the digit
                    assert value[-1] == "0"
                    value = value[:-1] + str(ones)
                else:
                    value = str(value) + str(ones)
            elif ones < 10:
                if value % 10 == 0:
                    value += ones
                else:
                    value = str(value) + str(ones)
            else:  # eleven to nineteen
                if value % 100 == 0:
                    value += ones
                else:
                    value = str(value) + str(ones)
        elif current in self.ones_suffixed:
            # ordinal or cardinal; yield the number right away
            ones, suffix = self.ones_suffixed[current]
            if value is None:
                yield output(str(ones) + suffix)
            elif isinstance(value, str) or prev in self.ones:
                if prev in self.tens and ones < 10:
                    assert value[-1] == "0"
                    yield output(value[:-1] + str(ones) + suffix)
                else:
                    yield output(str(value) + str(ones) + suffix)
            elif ones < 10:
                if value % 10 == 0:
                    yield output(str(value + ones) + suffix)
                else:
                    yield output(str(value) + str(ones) + suffix)
            else:  # eleven to nineteen
                if value % 100 == 0:
                    yield output(str(value + ones) + suffix)
                else:
                    yield output(str(value) + str(ones) + suffix)
            value = None
        elif current in self.tens:
            tens = self.tens[current]
            if value is None:
                value = tens
            elif isinstance(value, str):
                value = str(value) + str(tens)
            else:
                if value % 100 == 0:
                    value += tens
                else:
                    value = str(value) + str(tens)
        elif current in self.tens_suffixed:
            # ordinal or cardinal; yield the number right away
            tens, suffix = self.tens_suffixed[current]
            if value is None:
                yield output(str(tens) + suffix)
            elif isinstance(value, str):
                yield output(str(value) + str(tens) + suffix)
            else:
                if value % 100 == 0:
                    yield output(str(value + tens) + suffix)
                else:
                    yield output(str(value) + str(tens) + suffix)
        elif current in self.multipliers:
            multiplier = self.multipliers[current]
            if value is None:
                value = multiplier
            elif isinstance(value, str) or value == 0:
                f = to_fraction(value)
                p = f * multiplier if f is not None else None
                if f is not None and p.denominator == 1:
                    value = p.numerator
                else:
                    yield output(value)
                    value = multiplier
            else:
                before = value // 1000 * 1000
                residual = value % 1000
                value = before + residual * multiplier
        elif current in self.multipliers_suffixed:
            multiplier, suffix = self.multipliers_suffixed[current]
            if value is None:
                yield output(str(multiplier) + suffix)
            elif isinstance(value, str):
                f = to_fraction(value)
                p = f * multiplier if f is not None else None
                if f is not None and p.denominator == 1:
                    yield output(str(p.numerator) + suffix)
                else:
                    yield output(value)
                    yield output(str(multiplier) + suffix)
            else:  # int
                before = value // 1000 * 1000
                residual = value % 1000
                value = before + residual * multiplier
                yield output(str(value) + suffix)
            value = None
        elif current in self.preceding_prefixers:
            # apply prefix (positive, minus, etc.) if it precedes a number
            if value is not None:
                yield output(value)

            if next in self.words or next_is_numeric:
                prefix = self.preceding_prefixers[current]
            else:
                yield output(current)
        elif current in self.following_prefixers:
            # apply prefix (dollars, cents, etc.) only after a number
            if value is not None:
                prefix = self.following_prefixers[current]
                yield output(value)
            else:
                yield output(current)
        elif current in self.suffixers:
            # apply suffix symbols (percent -> '%')
            if value is not None:
                suffix = self.suffixers[current]
                if isinstance(suffix, dict):
                    if next in suffix:
                        yield output(str(value) + suffix[next])
                        skip = True
                    else:
                        yield output(value)
                        yield output(current)
                else:
                    yield output(str(value) + suffix)
            else:
                yield output(current)
        elif current in self.specials:
            if next not in self.words and not next_is_numeric:
                # apply special handling only if the next word can be numeric
                if value is not None:
                    yield output(value)
                yield output(current)
            elif current == "and":
                # ignore "and" after hundreds, thousands, etc.
                if prev not in self.multipliers:
                    if value is not None:
                        yield output(value)
                    yield output(current)
            elif current == "double" or current == "triple":
                if next in self.ones or next in self.zeros:
                    repeats = 2 if current == "double" else 3
                    ones = self.ones.get(next, 0)
                    value = str(value or "") + str(ones) * repeats
                    skip = True
                else:
                    if value is not None:
                        yield output(value)
                    yield output(current)
            elif current == "point":
                if next in self.decimals or next_is_numeric:
                    value = str(value or "") + "."
            else:
                # should all have been covered at this point
                raise ValueError(f"Unexpected token: {current}")
        else:
            # all should have been covered at this point
            raise ValueError(f"Unexpected token: {current}")

    if value is not None:
        yield output(value)

def preprocess(self, s: str):
    # replace "<number> and a half" with "<number> point five"
    results = []

    segments = re.split(r"\band\s+a\s+half\b", s)
    for i, segment in enumerate(segments):
        if len(segment.strip()) == 0:
            continue
        if i == len(segments) - 1:
            results.append(segment)
        else:
            results.append(segment)
            last_word = segment.rsplit(maxsplit=2)[-1]
            if last_word in self.decimals or last_word in self.multipliers:
                results.append("point five")
            else:
                results.append("and a half")

    s = " ".join(results)

    # put a space at number/letter boundary
    s = re.sub(r"([a-z])([0-9])", r"\1 \2", s)
    s = re.sub(r"([0-9])([a-z])", r"\1 \2", s)

    # but remove spaces which could be a suffix
    s = re.sub(r"([0-9])\s+(st|nd|rd|th|s)\b", r"\1\2", s)

    return s

def postprocess(self, s: str):
    def combine_cents(m: Match):
        try:
            currency = m.group(1)
            integer = m.group(2)
            cents = int(m.group(3))
            return f"{currency}{integer}.{cents:02d}"
        except ValueError:
            return m.string

    def extract_cents(m: Match):
        try:
            return f"Ā¢{int(m.group(1))}"
        except ValueError:
            return m.string

    # apply currency postprocessing; "$2 and ¢7" -> "$2.07"
    s = re.sub(r"([€£$])([0-9]+) (?:and )?Ā¢([0-9]{1,2})\b", combine_cents, s)
    s = re.sub(r"[€£$]0.([0-9]{1,2})\b", extract_cents, s)

    # write "one(s)" instead of "1(s)", just for the readability
    s = re.sub(r"\b1(s?)\b", r"one\1", s)

    return s

def __call__(self, s: str):
    s = self.preprocess(s)
    s = " ".join(word for word in self.process_words(s.split()) if word is not None)
    s = self.postprocess(s)

    return s

class EnglishSpellingNormalizer:
ā€œā€ā€
Applies British-American spelling mappings as listed in [1].

[1] https://www.tysto.com/uk-us-spelling-list.html
"""

def __init__(self):
    mapping_path = os.path.join(os.path.dirname(__file__), "english.json")
    self.mapping = json.load(open(mapping_path))

def __call__(self, s: str):
    return " ".join(self.mapping.get(word, word) for word in s.split())

class EnglishTextNormalizer:
def init(self):
self.ignore_patterns = rā€\b(hmm|mm|mhm|mmm|uh|um)\bā€
self.replacers = {
# common contractions
rā€\bwon’t\bā€: ā€œwill notā€,
rā€\bcan’t\bā€: ā€œcan notā€,
rā€\blet’s\bā€: ā€œlet usā€,
rā€\bain’t\bā€: ā€œaintā€,
rā€\by’all\bā€: ā€œyou allā€,
rā€\bwanna\bā€: ā€œwant toā€,
rā€\bgotta\bā€: ā€œgot toā€,
rā€\bgonna\bā€: ā€œgoing toā€,
rā€\bi’ma\bā€: ā€œi am going toā€,
rā€\bimma\bā€: ā€œi am going toā€,
rā€\bwoulda\bā€: ā€œwould haveā€,
rā€\bcoulda\bā€: ā€œcould haveā€,
rā€\bshoulda\bā€: ā€œshould haveā€,
rā€\bma’am\bā€: ā€œmadamā€,
# contractions in titles/prefixes
rā€\bmr\bā€: ā€œmister ā€œ,
rā€\bmrs\bā€: ā€œmissus ā€œ,
rā€\bst\bā€: ā€œsaint ā€œ,
rā€\bdr\bā€: ā€œdoctor ā€œ,
rā€\bprof\bā€: ā€œprofessor ā€œ,
rā€\bcapt\bā€: ā€œcaptain ā€œ,
rā€\bgov\bā€: ā€œgovernor ā€œ,
rā€\bald\bā€: ā€œalderman ā€œ,
rā€\bgen\bā€: ā€œgeneral ā€œ,
rā€\bsen\bā€: ā€œsenator ā€œ,
rā€\brep\bā€: ā€œrepresentative ā€œ,
rā€\bpres\bā€: ā€œpresident ā€œ,
rā€\brev\bā€: ā€œreverend ā€œ,
rā€\bhon\bā€: ā€œhonorable ā€œ,
rā€\basst\bā€: ā€œassistant ā€œ,
rā€\bassoc\bā€: ā€œassociate ā€œ,
rā€\blt\bā€: ā€œlieutenant ā€œ,
rā€\bcol\bā€: ā€œcolonel ā€œ,
rā€\bjr\bā€: ā€œjunior ā€œ,
rā€\bsr\bā€: ā€œsenior ā€œ,
rā€\besq\bā€: ā€œesquire ā€œ,
# prefect tenses, ideally it should be any past participles, but it’s harder..
rā€ā€˜d been\bā€: ā€œ had beenā€,
rā€ā€˜s been\bā€: ā€œ has beenā€,
rā€ā€˜d gone\bā€: ā€œ had goneā€,
rā€ā€˜s gone\bā€: ā€œ has goneā€,
rā€ā€˜d done\bā€: ā€œ had doneā€, # ā€œā€˜s doneā€ is ambiguous
rā€ā€˜s got\bā€: ā€œ has gotā€,
# general contractions
rā€n’t\bā€: ā€œ notā€,
rā€ā€˜re\bā€: ā€œ areā€,
rā€ā€˜s\bā€: ā€œ isā€,
rā€ā€˜d\bā€: ā€œ wouldā€,
rā€ā€˜ll\bā€: ā€œ willā€,
rā€ā€˜t\bā€: ā€œ notā€,
rā€ā€˜ve\bā€: ā€œ haveā€,
rā€ā€˜m\bā€: ā€œ amā€,
}
self.standardize_numbers = EnglishNumberNormalizer()
self.standardize_spellings = EnglishSpellingNormalizer()

def __call__(self, s: str):
    s = s.lower()

    s = re.sub(r"[<\[][^>\]]*[>\]]", "", s)  # remove words between brackets
    s = re.sub(r"\(([^)]+?)\)", "", s)  # remove words between parenthesis
    s = re.sub(self.ignore_patterns, "", s)
    s = re.sub(r"\s+'", "'", s)  # when there's a space before an apostrophe

    for pattern, replacement in self.replacers.items():
        s = re.sub(pattern, replacement, s)

    s = re.sub(r"(\d),(\d)", r"\1\2", s)  # remove commas between digits
    s = re.sub(r"\.([^0-9]|$)", r" \1", s)  # remove periods not followed by numbers
    s = remove_symbols_and_diacritics(s, keep=".%$¢€£")  # keep numeric symbols

    s = self.standardize_numbers(s)
    s = self.standardize_spellings(s)

    # now remove prefix/suffix symbols that are not preceded/followed by numbers
    s = re.sub(r"[.$¢€£]([^0-9])", r" \1", s)
    s = re.sub(r"([^0-9])%", r"\1 ", s)

    s = re.sub(r"\s+", " ", s)  # replace any successive whitespaces with a space

    return s

./utils/english.json

{
ā€œaccessoriseā€: ā€œaccessorizeā€,
ā€œaccessorisedā€: ā€œaccessorizedā€,
ā€œaccessorisesā€: ā€œaccessorizesā€,
ā€œaccessorisingā€: ā€œaccessorizingā€,
ā€œacclimatisationā€: ā€œacclimatizationā€,
ā€œacclimatiseā€: ā€œacclimatizeā€,
ā€œacclimatisedā€: ā€œacclimatizedā€,
ā€œacclimatisesā€: ā€œacclimatizesā€,
ā€œacclimatisingā€: ā€œacclimatizingā€,
ā€œaccoutrementsā€: ā€œaccoutermentsā€,
ā€œaeonā€: ā€œeonā€,
ā€œaeonsā€: ā€œeonsā€,
ā€œaerogrammeā€: ā€œaerogramā€,
ā€œaerogrammesā€: ā€œaerogramsā€,
ā€œaeroplaneā€: ā€œairplaneā€,
ā€œaeroplanesā€: ā€œairplanesā€,
ā€œaestheteā€: ā€œestheteā€,
ā€œaesthetesā€: ā€œesthetesā€,
ā€œaestheticā€: ā€œestheticā€,
ā€œaestheticallyā€: ā€œestheticallyā€,
ā€œaestheticsā€: ā€œestheticsā€,
ā€œaetiologyā€: ā€œetiologyā€,
ā€œageingā€: ā€œagingā€,
ā€œaggrandisementā€: ā€œaggrandizementā€,
ā€œagoniseā€: ā€œagonizeā€,
ā€œagonisedā€: ā€œagonizedā€,
ā€œagonisesā€: ā€œagonizesā€,
ā€œagonisingā€: ā€œagonizingā€,
ā€œagonisinglyā€: ā€œagonizinglyā€,
ā€œalmanackā€: ā€œalmanacā€,
ā€œalmanacksā€: ā€œalmanacsā€,
ā€œaluminiumā€: ā€œaluminumā€,
ā€œamortisableā€: ā€œamortizableā€,
ā€œamortisationā€: ā€œamortizationā€,
ā€œamortisationsā€: ā€œamortizationsā€,
ā€œamortiseā€: ā€œamortizeā€,
ā€œamortisedā€: ā€œamortizedā€,
ā€œamortisesā€: ā€œamortizesā€,
ā€œamortisingā€: ā€œamortizingā€,
ā€œamphitheatreā€: ā€œamphitheaterā€,
ā€œamphitheatresā€: ā€œamphitheatersā€,
ā€œanaemiaā€: ā€œanemiaā€,
ā€œanaemicā€: ā€œanemicā€,
ā€œanaesthesiaā€: ā€œanesthesiaā€,
ā€œanaestheticā€: ā€œanestheticā€,
ā€œanaestheticsā€: ā€œanestheticsā€,
ā€œanaesthetiseā€: ā€œanesthetizeā€,
ā€œanaesthetisedā€: ā€œanesthetizedā€,
ā€œanaesthetisesā€: ā€œanesthetizesā€,
ā€œanaesthetisingā€: ā€œanesthetizingā€,
ā€œanaesthetistā€: ā€œanesthetistā€,
ā€œanaesthetistsā€: ā€œanesthetistsā€,
ā€œanaesthetizeā€: ā€œanesthetizeā€,
ā€œanaesthetizedā€: ā€œanesthetizedā€,
ā€œanaesthetizesā€: ā€œanesthetizesā€,
ā€œanaesthetizingā€: ā€œanesthetizingā€,
ā€œanalogueā€: ā€œanalogā€,
ā€œanaloguesā€: ā€œanalogsā€,
ā€œanalyseā€: ā€œanalyzeā€,
ā€œanalysedā€: ā€œanalyzedā€,
ā€œanalysesā€: ā€œanalyzesā€,
ā€œanalysingā€: ā€œanalyzingā€,
ā€œangliciseā€: ā€œanglicizeā€,
ā€œanglicisedā€: ā€œanglicizedā€,
ā€œanglicisesā€: ā€œanglicizesā€,
ā€œanglicisingā€: ā€œanglicizingā€,
ā€œannualisedā€: ā€œannualizedā€,
ā€œantagoniseā€: ā€œantagonizeā€,
ā€œantagonisedā€: ā€œantagonizedā€,
ā€œantagonisesā€: ā€œantagonizesā€,
ā€œantagonisingā€: ā€œantagonizingā€,
ā€œapologiseā€: ā€œapologizeā€,
ā€œapologisedā€: ā€œapologizedā€,
ā€œapologisesā€: ā€œapologizesā€,
ā€œapologisingā€: ā€œapologizingā€,
ā€œappalā€: ā€œappallā€,
ā€œappalsā€: ā€œappallsā€,
ā€œappetiserā€: ā€œappetizerā€,
ā€œappetisersā€: ā€œappetizersā€,
ā€œappetisingā€: ā€œappetizingā€,
ā€œappetisinglyā€: ā€œappetizinglyā€,
ā€œarbourā€: ā€œarborā€,
ā€œarboursā€: ā€œarborsā€,
ā€œarcheologicalā€: ā€œarchaeologicalā€,
ā€œarchaeologicallyā€: ā€œarcheologicallyā€,
ā€œarchaeologistā€: ā€œarcheologistā€,
ā€œarchaeologistsā€: ā€œarcheologistsā€,
ā€œarchaeologyā€: ā€œarcheologyā€œ,
ā€œardourā€: ā€œardorā€,
ā€œarmourā€: ā€œarmorā€,
ā€œarmouredā€: ā€œarmoredā€,
ā€œarmourerā€: ā€œarmorerā€,
ā€œarmourersā€: ā€œarmorersā€,
ā€œarmouriesā€: ā€œarmoriesā€,
ā€œarmouryā€: ā€œarmoryā€,
ā€œartefactā€: ā€œartifactā€,
ā€œartefactsā€: ā€œartifactsā€,
ā€œauthoriseā€: ā€œauthorizeā€,
ā€œauthorisedā€: ā€œauthorizedā€,
ā€œauthorisesā€: ā€œauthorizesā€,
ā€œauthorisingā€: ā€œauthorizingā€,
ā€œaxeā€: ā€œaxā€,
ā€œbackpedalledā€: ā€œbackpedaledā€,
ā€œbackpedallingā€: ā€œbackpedalingā€,
ā€œbannisterā€: ā€œbanisterā€,
ā€œbannistersā€: ā€œbanistersā€,
ā€œbaptiseā€: ā€œbaptizeā€,
ā€œbaptisedā€: ā€œbaptizedā€,
ā€œbaptisesā€: ā€œbaptizesā€,
ā€œbaptisingā€: ā€œbaptizingā€,
ā€œbastardiseā€: ā€œbastardizeā€,
ā€œbastardisedā€: ā€œbastardizedā€,
ā€œbastardisesā€: ā€œbastardizesā€,
ā€œbastardisingā€: ā€œbastardizingā€,
ā€œbattleaxā€: ā€œbattleaxeā€,
ā€œbaulkā€: ā€œbalkā€,
ā€œbaulkedā€: ā€œbalkedā€,
ā€œbaulkingā€: ā€œbalkingā€,
ā€œbaulksā€: ā€œbalksā€,
ā€œbedevilledā€: ā€œbedeviledā€,
ā€œbedevillingā€: ā€œbedevilingā€,
ā€œbehaviourā€: ā€œbehaviorā€,
ā€œbehaviouralā€: ā€œbehavioralā€,
ā€œbehaviourismā€: ā€œbehaviorismā€,
ā€œbehaviouristā€: ā€œbehavioristā€,
ā€œbehaviouristsā€: ā€œbehavioristsā€,
ā€œbehavioursā€: ā€œbehaviorsā€,
ā€œbehoveā€: ā€œbehooveā€,
ā€œbehovedā€: ā€œbehoovedā€,
ā€œbehovesā€: ā€œbehoovesā€,
ā€œbejewelledā€: ā€œbejeweledā€,
ā€œbelabourā€: ā€œbelaborā€,
ā€œbelabouredā€: ā€œbelaboredā€,
ā€œbelabouringā€: ā€œbelaboringā€,
ā€œbelaboursā€: ā€œbelaborsā€,
ā€œbevelledā€: ā€œbeveledā€,
ā€œbevviesā€: ā€œbeviesā€,
ā€œbevvyā€: ā€œbevyā€,
ā€œbiassedā€: ā€œbiasedā€,
ā€œbiassingā€: ā€œbiasingā€,
ā€œbingeingā€: ā€œbingingā€,
ā€œbougainvillaeaā€: ā€œbougainvilleaā€,
ā€œbougainvillaeasā€: ā€œbougainvilleasā€,
ā€œbowdleriseā€: ā€œbowdlerizeā€,
ā€œbowdlerisedā€: ā€œbowdlerizedā€,
ā€œbowdlerisesā€: ā€œbowdlerizesā€,
ā€œbowdlerisingā€: ā€œbowdlerizingā€,
ā€œbreathalyseā€: ā€œbreathalyzeā€,
ā€œbreathalysedā€: ā€œbreathalyzedā€,
ā€œbreathalyserā€: ā€œbreathalyzerā€,
ā€œbreathalysersā€: ā€œbreathalyzersā€,
ā€œbreathalysesā€: ā€œbreathalyzesā€,
ā€œbreathalysingā€: ā€œbreathalyzingā€,
ā€œbrutaliseā€: ā€œbrutalizeā€,
ā€œbrutalisedā€: ā€œbrutalizedā€,
ā€œbrutalisesā€: ā€œbrutalizesā€,
ā€œbrutalisingā€: ā€œbrutalizingā€,
ā€œbussesā€: ā€œbusesā€,
ā€œbussingā€: ā€œbusingā€,
ā€œcaesareanā€: ā€œcesareanā€,
ā€œcaesareansā€: ā€œcesareansā€,
ā€œcalibreā€: ā€œcaliberā€,
ā€œcalibresā€: ā€œcalibersā€,
ā€œcalliperā€: ā€œcaliperā€,
ā€œcallipersā€: ā€œcalipersā€,
ā€œcallisthenicsā€: ā€œcalisthenicsā€,
ā€œcanaliseā€: ā€œcanalizeā€,
ā€œcanalisedā€: ā€œcanalizedā€,
ā€œcanalisesā€: ā€œcanalizesā€,
ā€œcanalisingā€: ā€œcanalizingā€,
ā€œcancelationā€: ā€œcancellationā€,
ā€œcancelationsā€: ā€œcancellationsā€,
ā€œcancelledā€: ā€œcanceledā€,
ā€œcancellingā€: ā€œcancelingā€,
ā€œcandourā€: ā€œcandorā€,
ā€œcannibaliseā€: ā€œcannibalizeā€,
ā€œcannibalisedā€: ā€œcannibalizedā€,
ā€œcannibalisesā€: ā€œcannibalizesā€,
ā€œcannibalisingā€: ā€œcannibalizingā€,
ā€œcanoniseā€: ā€œcanonizeā€,
ā€œcanonisedā€: ā€œcanonizedā€,
ā€œcanonisesā€: ā€œcanonizesā€,
ā€œcanonisingā€: ā€œcanonizingā€,
ā€œcapitaliseā€: ā€œcapitalizeā€,
ā€œcapitalisedā€: ā€œcapitalizedā€,
ā€œcapitalisesā€: ā€œcapitalizesā€,
ā€œcapitalisingā€: ā€œcapitalizingā€,
ā€œcarameliseā€: ā€œcaramelizeā€,
ā€œcaramelisedā€: ā€œcaramelizedā€,
ā€œcaramelisesā€: ā€œcaramelizesā€,
ā€œcaramelisingā€: ā€œcaramelizingā€,
ā€œcarboniseā€: ā€œcarbonizeā€,
ā€œcarbonisedā€: ā€œcarbonizedā€,
ā€œcarbonisesā€: ā€œcarbonizesā€,
ā€œcarbonisingā€: ā€œcarbonizingā€,
ā€œcarolledā€: ā€œcaroledā€,
ā€œcarollingā€: ā€œcarolingā€,
ā€œcatalogueā€: ā€œcatalogā€,
ā€œcataloguedā€: ā€œcatalogedā€,
ā€œcataloguesā€: ā€œcatalogsā€,
ā€œcataloguingā€: ā€œcatalogingā€,
ā€œcatalyseā€: ā€œcatalyzeā€,
ā€œcatalysedā€: ā€œcatalyzedā€,
ā€œcatalysesā€: ā€œcatalyzesā€,
ā€œcatalysingā€: ā€œcatalyzingā€,
ā€œcategoriseā€: ā€œcategorizeā€,
ā€œcategorisedā€: ā€œcategorizedā€,
ā€œcategorisesā€: ā€œcategorizesā€,
ā€œcategorisingā€: ā€œcategorizingā€,
ā€œcauteriseā€: ā€œcauterizeā€,
ā€œcauterisedā€: ā€œcauterizedā€,
ā€œcauterisesā€: ā€œcauterizesā€,
ā€œcauterisingā€: ā€œcauterizingā€,
ā€œcavilledā€: ā€œcaviledā€,
ā€œcavillingā€: ā€œcavilingā€,
ā€œcentigrammeā€: ā€œcentigramā€,
ā€œcentigrammesā€: ā€œcentigramsā€,
ā€œcentilitreā€: ā€œcentiliterā€,
ā€œcentilitresā€: ā€œcentilitersā€,
ā€œcentimetreā€: ā€œcentimeterā€,
ā€œcentimetresā€: ā€œcentimetersā€,
ā€œcentraliseā€: ā€œcentralizeā€,
ā€œcentralisedā€: ā€œcentralizedā€,
ā€œcentralisesā€: ā€œcentralizesā€,
ā€œcentralisingā€: ā€œcentralizingā€,
ā€œcentreā€: ā€œcenterā€,
ā€œcentredā€: ā€œcenteredā€,
ā€œcentrefoldā€: ā€œcenterfoldā€,
ā€œcentrefoldsā€: ā€œcenterfoldsā€,
ā€œcentrepieceā€: ā€œcenterpieceā€,
ā€œcentrepiecesā€: ā€œcenterpiecesā€,
ā€œcentresā€: ā€œcentersā€,
ā€œchannelledā€: ā€œchanneledā€,
ā€œchannellingā€: ā€œchannelingā€,
ā€œcharacteriseā€: ā€œcharacterizeā€,
ā€œcharacterisedā€: ā€œcharacterizedā€,
ā€œcharacterisesā€: ā€œcharacterizesā€,
ā€œcharacterisingā€: ā€œcharacterizingā€,
ā€œchequeā€: ā€œcheckā€,
ā€œchequebookā€: ā€œcheckbookā€,
ā€œchequebooksā€: ā€œcheckbooksā€,
ā€œchequeredā€: ā€œcheckeredā€,
ā€œchequesā€: ā€œchecksā€,
ā€œchilliā€: ā€œchiliā€,
ā€œchimaeraā€: ā€œchimeraā€,
ā€œchimaerasā€: ā€œchimerasā€,
ā€œchiselledā€: ā€œchiseledā€,
ā€œchisellingā€: ā€œchiselingā€,
ā€œcirculariseā€: ā€œcircularizeā€,
ā€œcircularisedā€: ā€œcircularizedā€,
ā€œcircularisesā€: ā€œcircularizesā€,
ā€œcircularisingā€: ā€œcircularizingā€,
ā€œciviliseā€: ā€œcivilizeā€,
ā€œcivilisedā€: ā€œcivilizedā€,
ā€œcivilisesā€: ā€œcivilizesā€,
ā€œcivilisingā€: ā€œcivilizingā€,
ā€œclamourā€: ā€œclamorā€,
ā€œclamouredā€: ā€œclamoredā€,
ā€œclamouringā€: ā€œclamoringā€,
ā€œclamoursā€: ā€œclamorsā€,
ā€œclangourā€: ā€œclangorā€,
ā€œclarinettistā€: ā€œclarinetistā€,
ā€œclarinettistsā€: ā€œclarinetistsā€,
ā€œcollectiviseā€: ā€œcollectivizeā€,
ā€œcollectivisedā€: ā€œcollectivizedā€,
ā€œcollectivisesā€: ā€œcollectivizesā€,
ā€œcollectivisingā€: ā€œcollectivizingā€,
ā€œcolonisationā€: ā€œcolonizationā€,
ā€œcoloniseā€: ā€œcolonizeā€,
ā€œcolonisedā€: ā€œcolonizedā€,
ā€œcoloniserā€: ā€œcolonizerā€,
ā€œcolonisersā€: ā€œcolonizersā€,
ā€œcolonisesā€: ā€œcolonizesā€,
ā€œcolonisingā€: ā€œcolonizingā€,
ā€œcolourā€: ā€œcolorā€,
ā€œcolourantā€: ā€œcolorantā€,
ā€œcolourantsā€: ā€œcolorantsā€,
ā€œcolouredā€: ā€œcoloredā€,
ā€œcolouredsā€: ā€œcoloredsā€,
ā€œcolourfulā€: ā€œcolorfulā€,
ā€œcolourfullyā€: ā€œcolorfullyā€,
ā€œcolouringā€: ā€œcoloringā€,
ā€œcolourizeā€: ā€œcolorizeā€,
ā€œcolourizedā€: ā€œcolorizedā€,
ā€œcolourizesā€: ā€œcolorizesā€,
ā€œcolourizingā€: ā€œcolorizingā€,
ā€œcolourlessā€: ā€œcolorlessā€,
ā€œcoloursā€: ā€œcolorsā€,
ā€œcommercialiseā€: ā€œcommercializeā€,
ā€œcommercialisedā€: ā€œcommercializedā€,
ā€œcommercialisesā€: ā€œcommercializesā€,
ā€œcommercialisingā€: ā€œcommercializingā€,
ā€œcompartmentaliseā€: ā€œcompartmentalizeā€,
ā€œcompartmentalisedā€: ā€œcompartmentalizedā€,
ā€œcompartmentalisesā€: ā€œcompartmentalizesā€,
ā€œcompartmentalisingā€: ā€œcompartmentalizingā€,
ā€œcomputeriseā€: ā€œcomputerizeā€,
ā€œcomputerisedā€: ā€œcomputerizedā€,
ā€œcomputerisesā€: ā€œcomputerizesā€,
ā€œcomputerisingā€: ā€œcomputerizingā€,
ā€œconceptualiseā€: ā€œconceptualizeā€,
ā€œconceptualisedā€: ā€œconceptualizedā€,
ā€œconceptualisesā€: ā€œconceptualizesā€,
ā€œconceptualisingā€: ā€œconceptualizingā€,
ā€œconnexionā€: ā€œconnectionā€,
ā€œconnexionsā€: ā€œconnectionsā€,
ā€œcontextualiseā€: ā€œcontextualizeā€,
ā€œcontextualisedā€: ā€œcontextualizedā€,
ā€œcontextualisesā€: ā€œcontextualizesā€,
ā€œcontextualisingā€: ā€œcontextualizingā€,
ā€œcosierā€: ā€œcozierā€,
ā€œcosiesā€: ā€œcoziesā€,
ā€œcosiestā€: ā€œcoziestā€,
ā€œcosilyā€: ā€œcozilyā€,
ā€œcosinessā€: ā€œcozinessā€,
ā€œcosyā€: ā€œcozyā€,
ā€œcouncillorā€: ā€œcouncilorā€,
ā€œcouncillorsā€: ā€œcouncilorsā€,
ā€œcounselledā€: ā€œcounseledā€,
ā€œcounsellingā€: ā€œcounselingā€,
ā€œcounsellorā€: ā€œcounselorā€,
ā€œcounsellorsā€: ā€œcounselorsā€,
ā€œcrenelatedā€: ā€œcrenellatedā€,
ā€œcriminaliseā€: ā€œcriminalizeā€,
ā€œcriminalisedā€: ā€œcriminalizedā€,
ā€œcriminalisesā€: ā€œcriminalizesā€,
ā€œcriminalisingā€: ā€œcriminalizingā€,
ā€œcriticiseā€: ā€œcriticizeā€,
ā€œcriticisedā€: ā€œcriticizedā€,
ā€œcriticisesā€: ā€œcriticizesā€,
ā€œcriticisingā€: ā€œcriticizingā€,
ā€œcruellerā€: ā€œcruelerā€,
ā€œcruellestā€: ā€œcruelestā€,
ā€œcrystallisationā€: ā€œcrystallizationā€,
ā€œcrystalliseā€: ā€œcrystallizeā€,
ā€œcrystallisedā€: ā€œcrystallizedā€,
ā€œcrystallisesā€: ā€œcrystallizesā€,
ā€œcrystallisingā€: ā€œcrystallizingā€,
ā€œcudgelledā€: ā€œcudgeledā€,
ā€œcudgellingā€: ā€œcudgelingā€,
ā€œcustomiseā€: ā€œcustomizeā€,
ā€œcustomisedā€: ā€œcustomizedā€,
ā€œcustomisesā€: ā€œcustomizesā€,
ā€œcustomisingā€: ā€œcustomizingā€,
ā€œcypherā€: ā€œcipherā€,
ā€œcyphersā€: ā€œciphersā€,
ā€œdecentralisationā€: ā€œdecentralizationā€,
ā€œdecentraliseā€: ā€œdecentralizeā€,
ā€œdecentralisedā€: ā€œdecentralizedā€,
ā€œdecentralisesā€: ā€œdecentralizesā€,
ā€œdecentralisingā€: ā€œdecentralizingā€,
ā€œdecriminalisationā€: ā€œdecriminalizationā€,
ā€œdecriminaliseā€: ā€œdecriminalizeā€,
ā€œdecriminalisedā€: ā€œdecriminalizedā€,
ā€œdecriminalisesā€: ā€œdecriminalizesā€,
ā€œdecriminalisingā€: ā€œdecriminalizingā€,
ā€œdefenceā€: ā€œdefenseā€,
ā€œdefencelessā€: ā€œdefenselessā€,
ā€œdefencesā€: ā€œdefensesā€,
ā€œdehumanisationā€: ā€œdehumanizationā€,
ā€œdehumaniseā€: ā€œdehumanizeā€,
ā€œdehumanisedā€: ā€œdehumanizedā€,
ā€œdehumanisesā€: ā€œdehumanizesā€,
ā€œdehumanisingā€: ā€œdehumanizingā€,
ā€œdemeanourā€: ā€œdemeanorā€,
ā€œdemilitarisationā€: ā€œdemilitarizationā€,
ā€œdemilitariseā€: ā€œdemilitarizeā€,
ā€œdemilitarisedā€: ā€œdemilitarizedā€,
ā€œdemilitarisesā€: ā€œdemilitarizesā€,
ā€œdemilitarisingā€: ā€œdemilitarizingā€,
ā€œdemobilisationā€: ā€œdemobilizationā€,
ā€œdemobiliseā€: ā€œdemobilizeā€,
ā€œdemobilisedā€: ā€œdemobilizedā€,
ā€œdemobilisesā€: ā€œdemobilizesā€,
ā€œdemobilisingā€: ā€œdemobilizingā€,
ā€œdemocratisationā€: ā€œdemocratizationā€,
ā€œdemocratiseā€: ā€œdemocratizeā€,
ā€œdemocratisedā€: ā€œdemocratizedā€,
ā€œdemocratisesā€: ā€œdemocratizesā€,
ā€œdemocratisingā€: ā€œdemocratizingā€,
ā€œdemoniseā€: ā€œdemonizeā€,
ā€œdemonisedā€: ā€œdemonizedā€,
ā€œdemonisesā€: ā€œdemonizesā€,
ā€œdemonisingā€: ā€œdemonizingā€,
ā€œdemoralisationā€: ā€œdemoralizationā€,
ā€œdemoraliseā€: ā€œdemoralizeā€,
ā€œdemoralisedā€: ā€œdemoralizedā€,
ā€œdemoralisesā€: ā€œdemoralizesā€,
ā€œdemoralisingā€: ā€œdemoralizingā€,
ā€œdenationalisationā€: ā€œdenationalizationā€,
ā€œdenationaliseā€: ā€œdenationalizeā€,
ā€œdenationalisedā€: ā€œdenationalizedā€,
ā€œdenationalisesā€: ā€œdenationalizesā€,
ā€œdenationalisingā€: ā€œdenationalizingā€,
ā€œdeodoriseā€: ā€œdeodorizeā€,
ā€œdeodorisedā€: ā€œdeodorizedā€,
ā€œdeodorisesā€: ā€œdeodorizesā€,
ā€œdeodorisingā€: ā€œdeodorizingā€,
ā€œdepersonaliseā€: ā€œdepersonalizeā€,
ā€œdepersonalisedā€: ā€œdepersonalizedā€,
ā€œdepersonalisesā€: ā€œdepersonalizesā€,
ā€œdepersonalisingā€: ā€œdepersonalizingā€,
ā€œdeputiseā€: ā€œdeputizeā€,
ā€œdeputisedā€: ā€œdeputizedā€,
ā€œdeputisesā€: ā€œdeputizesā€,
ā€œdeputisingā€: ā€œdeputizingā€,
ā€œdesensitisationā€: ā€œdesensitizationā€,
ā€œdesensitiseā€: ā€œdesensitizeā€,
ā€œdesensitisedā€: ā€œdesensitizedā€,
ā€œdesensitisesā€: ā€œdesensitizesā€,
ā€œdesensitisingā€: ā€œdesensitizingā€,
ā€œdestabilisationā€: ā€œdestabilizationā€,
ā€œdestabiliseā€: ā€œdestabilizeā€,
ā€œdestabilisedā€: ā€œdestabilizedā€,
ā€œdestabilisesā€: ā€œdestabilizesā€,
ā€œdestabilisingā€: ā€œdestabilizingā€,
ā€œdialledā€: ā€œdialedā€,
ā€œdiallingā€: ā€œdialingā€,
ā€œdialogueā€: ā€œdialogā€,
ā€œdialoguesā€: ā€œdialogsā€,
ā€œdiarrhoeaā€: ā€œdiarrheaā€,
ā€œdigitiseā€: ā€œdigitizeā€,
ā€œdigitisedā€: ā€œdigitizedā€,
ā€œdigitisesā€: ā€œdigitizesā€,
ā€œdigitisingā€: ā€œdigitizingā€,
ā€œdiscā€: ā€œdiskā€,
ā€œdiscolourā€: ā€œdiscolorā€,
ā€œdiscolouredā€: ā€œdiscoloredā€,
ā€œdiscolouringā€: ā€œdiscoloringā€,
ā€œdiscoloursā€: ā€œdiscolorsā€,
ā€œdiscsā€: ā€œdisksā€,
ā€œdisembowelledā€: ā€œdisemboweledā€,
ā€œdisembowellingā€: ā€œdisembowelingā€,
ā€œdisfavourā€: ā€œdisfavorā€,
ā€œdishevelledā€: ā€œdisheveledā€,
ā€œdishonourā€: ā€œdishonorā€,
ā€œdishonourableā€: ā€œdishonorableā€,
ā€œdishonourablyā€: ā€œdishonorablyā€,
ā€œdishonouredā€: ā€œdishonoredā€,
ā€œdishonouringā€: ā€œdishonoringā€,
ā€œdishonoursā€: ā€œdishonorsā€,
ā€œdisorganisationā€: ā€œdisorganizationā€,
ā€œdisorganisedā€: ā€œdisorganizedā€,
ā€œdistilā€: ā€œdistillā€,
ā€œdistilsā€: ā€œdistillsā€,
ā€œdramatisationā€: ā€œdramatizationā€,
ā€œdramatisationsā€: ā€œdramatizationsā€,
ā€œdramatiseā€: ā€œdramatizeā€,
ā€œdramatisedā€: ā€œdramatizedā€,
ā€œdramatisesā€: ā€œdramatizesā€,
ā€œdramatisingā€: ā€œdramatizingā€,
ā€œdraughtā€: ā€œdraftā€,
ā€œdraughtboardā€: ā€œdraftboardā€,
ā€œdraughtboardsā€: ā€œdraftboardsā€,
ā€œdraughtierā€: ā€œdraftierā€,
ā€œdraughtiestā€: ā€œdraftiestā€,
ā€œdraughtsā€: ā€œdraftsā€,
ā€œdraughtsmanā€: ā€œdraftsmanā€,
ā€œdraughtsmanshipā€: ā€œdraftsmanshipā€,
ā€œdraughtsmenā€: ā€œdraftsmenā€,
ā€œdraughtswomanā€: ā€œdraftswomanā€,
ā€œdraughtswomenā€: ā€œdraftswomenā€,
ā€œdraughtyā€: ā€œdraftyā€,
ā€œdrivelledā€: ā€œdriveledā€,
ā€œdrivellingā€: ā€œdrivelingā€,
ā€œduelledā€: ā€œdueledā€,
ā€œduellingā€: ā€œduelingā€,
ā€œeconomiseā€: ā€œeconomizeā€,
ā€œeconomisedā€: ā€œeconomizedā€,
ā€œeconomisesā€: ā€œeconomizesā€,
ā€œeconomisingā€: ā€œeconomizingā€,
ā€œedoemaā€: ā€œedemaā€,
ā€œeditorialiseā€: ā€œeditorializeā€,
ā€œeditorialisedā€: ā€œeditorializedā€,
ā€œeditorialisesā€: ā€œeditorializesā€,
ā€œeditorialisingā€: ā€œeditorializingā€,
ā€œempathiseā€: ā€œempathizeā€,
ā€œempathisedā€: ā€œempathizedā€,
ā€œempathisesā€: ā€œempathizesā€,
ā€œempathisingā€: ā€œempathizingā€,
ā€œemphasiseā€: ā€œemphasizeā€,
ā€œemphasisedā€: ā€œemphasizedā€,
ā€œemphasisesā€: ā€œemphasizesā€,
ā€œemphasisingā€: ā€œemphasizingā€,
ā€œenamelledā€: ā€œenameledā€,
ā€œenamellingā€: ā€œenamelingā€,
ā€œenamouredā€: ā€œenamoredā€,
ā€œencyclopaediaā€: ā€œencyclopediaā€,
ā€œencyclopaediasā€: ā€œencyclopediasā€,
ā€œencyclopaedicā€: ā€œencyclopedicā€,
ā€œendeavourā€: ā€œendeavorā€,
ā€œendeavouredā€: ā€œendeavoredā€,
ā€œendeavouringā€: ā€œendeavoringā€,
ā€œendeavoursā€: ā€œendeavorsā€,
ā€œenergiseā€: ā€œenergizeā€,
ā€œenergisedā€: ā€œenergizedā€,
ā€œenergisesā€: ā€œenergizesā€,
ā€œenergisingā€: ā€œenergizingā€,
ā€œenrolā€: ā€œenrollā€,
ā€œenrolsā€: ā€œenrollsā€,
ā€œenthralā€: ā€œenthrallā€,
ā€œenthralsā€: ā€œenthrallsā€,
ā€œepauletteā€: ā€œepauletā€,
ā€œepaulettesā€: ā€œepauletsā€,
ā€œepicentreā€: ā€œepicenterā€,
ā€œepicentresā€: ā€œepicentersā€,
ā€œepilogueā€: ā€œepilogā€,
ā€œepiloguesā€: ā€œepilogsā€,
ā€œepitomiseā€: ā€œepitomizeā€,
ā€œepitomisedā€: ā€œepitomizedā€,
ā€œepitomisesā€: ā€œepitomizesā€,
ā€œepitomisingā€: ā€œepitomizingā€,
ā€œequalisationā€: ā€œequalizationā€,
ā€œequaliseā€: ā€œequalizeā€,
ā€œequalisedā€: ā€œequalizedā€,
ā€œequaliserā€: ā€œequalizerā€,
ā€œequalisersā€: ā€œequalizersā€,
ā€œequalisesā€: ā€œequalizesā€,
ā€œequalisingā€: ā€œequalizingā€,
ā€œeulogiseā€: ā€œeulogizeā€,
ā€œeulogisedā€: ā€œeulogizedā€,
ā€œeulogisesā€: ā€œeulogizesā€,
ā€œeulogisingā€: ā€œeulogizingā€,
ā€œevangeliseā€: ā€œevangelizeā€,
ā€œevangelisedā€: ā€œevangelizedā€,
ā€œevangelisesā€: ā€œevangelizesā€,
ā€œevangelisingā€: ā€œevangelizingā€,
ā€œexorciseā€: ā€œexorcizeā€,
ā€œexorcisedā€: ā€œexorcizedā€,
ā€œexorcisesā€: ā€œexorcizesā€,
ā€œexorcisingā€: ā€œexorcizingā€,
ā€œextemporisationā€: ā€œextemporizationā€,
ā€œextemporiseā€: ā€œextemporizeā€,
ā€œextemporisedā€: ā€œextemporizedā€,
ā€œextemporisesā€: ā€œextemporizesā€,
ā€œextemporisingā€: ā€œextemporizingā€,
ā€œexternalisationā€: ā€œexternalizationā€,
ā€œexternalisationsā€: ā€œexternalizationsā€,
ā€œexternaliseā€: ā€œexternalizeā€,
ā€œexternalisedā€: ā€œexternalizedā€,
ā€œexternalisesā€: ā€œexternalizesā€,
ā€œexternalisingā€: ā€œexternalizingā€,
ā€œfactoriseā€: ā€œfactorizeā€,
ā€œfactorisedā€: ā€œfactorizedā€,
ā€œfactorisesā€: ā€œfactorizesā€,
ā€œfactorisingā€: ā€œfactorizingā€,
ā€œfaecalā€: ā€œfecalā€,
ā€œfaecesā€: ā€œfecesā€,
ā€œfamiliarisationā€: ā€œfamiliarizationā€,
ā€œfamiliariseā€: ā€œfamiliarizeā€,
ā€œfamiliarisedā€: ā€œfamiliarizedā€,
ā€œfamiliarisesā€: ā€œfamiliarizesā€,
ā€œfamiliarisingā€: ā€œfamiliarizingā€,
ā€œfantasiseā€: ā€œfantasizeā€,
ā€œfantasisedā€: ā€œfantasizedā€,
ā€œfantasisesā€: ā€œfantasizesā€,
ā€œfantasisingā€: ā€œfantasizingā€,
ā€œfavourā€: ā€œfavorā€,
ā€œfavourableā€: ā€œfavorableā€,
ā€œfavourablyā€: ā€œfavorablyā€,
ā€œfavouredā€: ā€œfavoredā€,
ā€œfavouringā€: ā€œfavoringā€,
ā€œfavouriteā€: ā€œfavoriteā€,
ā€œfavouritesā€: ā€œfavoritesā€,
ā€œfavouritismā€: ā€œfavoritismā€,
ā€œfavoursā€: ā€œfavorsā€,
ā€œfeminiseā€: ā€œfeminizeā€,
ā€œfeminisedā€: ā€œfeminizedā€,
ā€œfeminisesā€: ā€œfeminizesā€,
ā€œfeminisingā€: ā€œfeminizingā€,
ā€œfertilisationā€: ā€œfertilizationā€,
ā€œfertiliseā€: ā€œfertilizeā€,
ā€œfertilisedā€: ā€œfertilizedā€,
ā€œfertiliserā€: ā€œfertilizerā€,
ā€œfertilisersā€: ā€œfertilizersā€,
ā€œfertilisesā€: ā€œfertilizesā€,
ā€œfertilisingā€: ā€œfertilizingā€,
ā€œfervourā€: ā€œfervorā€,
ā€œfibreā€: ā€œfiberā€,
ā€œfibreglassā€: ā€œfiberglassā€,
ā€œfibresā€: ā€œfibersā€,
ā€œfictionalisationā€: ā€œfictionalizationā€,
ā€œfictionalisationsā€: ā€œfictionalizationsā€,
ā€œfictionaliseā€: ā€œfictionalizeā€,
ā€œfictionalisedā€: ā€œfictionalizedā€,
ā€œfictionalisesā€: ā€œfictionalizesā€,
ā€œfictionalisingā€: ā€œfictionalizingā€,
ā€œfilletā€: ā€œfiletā€,
ā€œfilletedā€: ā€œfiletedā€,
ā€œfilletingā€: ā€œfiletingā€,
ā€œfilletsā€: ā€œfiletsā€,
ā€œfinalisationā€: ā€œfinalizationā€,
ā€œfinaliseā€: ā€œfinalizeā€,
ā€œfinalisedā€: ā€œfinalizedā€,
ā€œfinalisesā€: ā€œfinalizesā€,
ā€œfinalisingā€: ā€œfinalizingā€,
ā€œflautistā€: ā€œflutistā€,
ā€œflautistsā€: ā€œflutistsā€,
ā€œflavourā€: ā€œflavorā€,
ā€œflavouredā€: ā€œflavoredā€,
ā€œflavouringā€: ā€œflavoringā€,
ā€œflavouringsā€: ā€œflavoringsā€,
ā€œflavourlessā€: ā€œflavorlessā€,
ā€œflavoursā€: ā€œflavorsā€,
ā€œflavoursomeā€: ā€œflavorsomeā€,
ā€œflyer / flierā€: ā€œflier / flyerā€,
ā€œfoetalā€: ā€œfetalā€,
ā€œfoetidā€: ā€œfetidā€,
ā€œfoetusā€: ā€œfetusā€,
ā€œfoetusesā€: ā€œfetusesā€,
ā€œformalisationā€: ā€œformalizationā€,
ā€œformaliseā€: ā€œformalizeā€,
ā€œformalisedā€: ā€œformalizedā€,
ā€œformalisesā€: ā€œformalizesā€,
ā€œformalisingā€: ā€œformalizingā€,
ā€œfossilisationā€: ā€œfossilizationā€,
ā€œfossiliseā€: ā€œfossilizeā€,
ā€œfossilisedā€: ā€œfossilizedā€,
ā€œfossilisesā€: ā€œfossilizesā€,
ā€œfossilisingā€: ā€œfossilizingā€,
ā€œfraternisationā€: ā€œfraternizationā€,
ā€œfraterniseā€: ā€œfraternizeā€,
ā€œfraternisedā€: ā€œfraternizedā€,
ā€œfraternisesā€: ā€œfraternizesā€,
ā€œfraternisingā€: ā€œfraternizingā€,
ā€œfulfilā€: ā€œfulfillā€,
ā€œfulfilmentā€: ā€œfulfillmentā€,
ā€œfulfilsā€: ā€œfulfillsā€,
ā€œfunnelledā€: ā€œfunneledā€,
ā€œfunnellingā€: ā€œfunnelingā€,
ā€œgalvaniseā€: ā€œgalvanizeā€,
ā€œgalvanisedā€: ā€œgalvanizedā€,
ā€œgalvanisesā€: ā€œgalvanizesā€,
ā€œgalvanisingā€: ā€œgalvanizingā€,
ā€œgambolledā€: ā€œgamboledā€,
ā€œgambollingā€: ā€œgambolingā€,
ā€œgaolā€: ā€œjailā€,
ā€œgaolbirdā€: ā€œjailbirdā€,
ā€œgaolbirdsā€: ā€œjailbirdsā€,
ā€œgaolbreakā€: ā€œjailbreakā€,
ā€œgaolbreaksā€: ā€œjailbreaksā€,
ā€œgaoledā€: ā€œjailedā€,
ā€œgaolerā€: ā€œjailerā€,
ā€œgaolersā€: ā€œjailersā€,
ā€œgaolingā€: ā€œjailingā€,
ā€œgaolsā€: ā€œjailsā€,
ā€œgassesā€: ā€œgasesā€,
ā€œgageā€: ā€œgaugeā€,
ā€œgagedā€: ā€œgaugedā€,
ā€œgagesā€: ā€œgaugesā€,
ā€œgagingā€: ā€œgaugingā€,
ā€œgeneralisationā€: ā€œgeneralizationā€,
ā€œgeneralisationsā€: ā€œgeneralizationsā€,
ā€œgeneraliseā€: ā€œgeneralizeā€,
ā€œgeneralisedā€: ā€œgeneralizedā€,
ā€œgeneralisesā€: ā€œgeneralizesā€,
ā€œgeneralisingā€: ā€œgeneralizingā€,
ā€œghettoiseā€: ā€œghettoizeā€,
ā€œghettoisedā€: ā€œghettoizedā€,
ā€œghettoisesā€: ā€œghettoizesā€,
ā€œghettoisingā€: ā€œghettoizingā€,
ā€œgipsiesā€: ā€œgypsiesā€,
ā€œglamoriseā€: ā€œglamorizeā€,
ā€œglamorisedā€: ā€œglamorizedā€,
ā€œglamorisesā€: ā€œglamorizesā€,
ā€œglamorisingā€: ā€œglamorizingā€,
ā€œglamorā€: ā€œglamourā€,
ā€œglobalisationā€: ā€œglobalizationā€,
ā€œglobaliseā€: ā€œglobalizeā€,
ā€œglobalisedā€: ā€œglobalizedā€,
ā€œglobalisesā€: ā€œglobalizesā€,
ā€œglobalisingā€: ā€œglobalizingā€,
ā€œglueingā€: ā€œgluingā€,
ā€œgoitreā€: ā€œgoiterā€,
ā€œgoitresā€: ā€œgoitersā€,
ā€œgonorrhoeaā€: ā€œgonorrheaā€,
ā€œgrammeā€: ā€œgramā€,
ā€œgrammesā€: ā€œgramsā€,
ā€œgravelledā€: ā€œgraveledā€,
ā€œgreyā€: ā€œgrayā€,
ā€œgreyedā€: ā€œgrayedā€,
ā€œgreyingā€: ā€œgrayingā€,
ā€œgreyishā€: ā€œgrayishā€,
ā€œgreynessā€: ā€œgraynessā€,
ā€œgreysā€: ā€œgraysā€,
ā€œgrovelledā€: ā€œgroveledā€,
ā€œgrovellingā€: ā€œgrovelingā€,
ā€œgroyneā€: ā€œgroinā€,
ā€œgroynesā€: ā€œgroinsā€,
ā€œgruellingā€: ā€œgruelingā€,
ā€œgruellinglyā€: ā€œgruelinglyā€,
ā€œgryphonā€: ā€œgriffinā€,
ā€œgryphonsā€: ā€œgriffinsā€,
ā€œgynaecologicalā€: ā€œgynecologicalā€,
ā€œgynaecologistā€: ā€œgynecologistā€,
ā€œgynaecologistsā€: ā€œgynecologistsā€,
ā€œgynaecologyā€: ā€œgynecologyā€,
ā€œhaematologicalā€: ā€œhematologicalā€,
ā€œhaematologistā€: ā€œhematologistā€,
ā€œhaematologistsā€: ā€œhematologistsā€,
ā€œhaematologyā€: ā€œhematologyā€,
ā€œhaemoglobinā€: ā€œhemoglobinā€,
ā€œhaemophiliaā€: ā€œhemophiliaā€,
ā€œhaemophiliacā€: ā€œhemophiliacā€,
ā€œhaemophiliacsā€: ā€œhemophiliacsā€,
ā€œhaemorrhageā€: ā€œhemorrhageā€,
ā€œhaemorrhagedā€: ā€œhemorrhagedā€,
ā€œhaemorrhagesā€: ā€œhemorrhagesā€,
ā€œhaemorrhagingā€: ā€œhemorrhagingā€,
ā€œhaemorrhoidsā€: ā€œhemorrhoidsā€,
ā€œharbourā€: ā€œharborā€,
ā€œharbouredā€: ā€œharboredā€,
ā€œharbouringā€: ā€œharboringā€,
ā€œharboursā€: ā€œharborsā€,
ā€œharmonisationā€: ā€œharmonizationā€,
ā€œharmoniseā€: ā€œharmonizeā€,
ā€œharmonisedā€: ā€œharmonizedā€,
ā€œharmonisesā€: ā€œharmonizesā€,
ā€œharmonisingā€: ā€œharmonizingā€,
ā€œhomoeopathā€: ā€œhomeopathā€,
ā€œhomoeopathicā€: ā€œhomeopathicā€,
ā€œhomoeopathsā€: ā€œhomeopathsā€,
ā€œhomoeopathyā€: ā€œhomeopathyā€,
ā€œhomogeniseā€: ā€œhomogenizeā€,
ā€œhomogenisedā€: ā€œhomogenizedā€,
ā€œhomogenisesā€: ā€œhomogenizesā€,
ā€œhomogenisingā€: ā€œhomogenizingā€,
ā€œhonourā€: ā€œhonorā€,
ā€œhonourableā€: ā€œhonorableā€,
ā€œhonourablyā€: ā€œhonorablyā€,
ā€œhonouredā€: ā€œhonoredā€,
ā€œhonouringā€: ā€œhonoringā€,
ā€œhonoursā€: ā€œhonorsā€,
ā€œhospitalisationā€: ā€œhospitalizationā€,
ā€œhospitaliseā€: ā€œhospitalizeā€,
ā€œhospitalisedā€: ā€œhospitalizedā€,
ā€œhospitalisesā€: ā€œhospitalizesā€,
ā€œhospitalisingā€: ā€œhospitalizingā€,
ā€œhumaniseā€: ā€œhumanizeā€,
ā€œhumanisedā€: ā€œhumanizedā€,
ā€œhumanisesā€: ā€œhumanizesā€,
ā€œhumanisingā€: ā€œhumanizingā€,
ā€œhumourā€: ā€œhumorā€,
ā€œhumouredā€: ā€œhumoredā€,
ā€œhumouringā€: ā€œhumoringā€,
ā€œhumourlessā€: ā€œhumorlessā€,
ā€œhumoursā€: ā€œhumorsā€,
ā€œhybridiseā€: ā€œhybridizeā€,
ā€œhybridisedā€: ā€œhybridizedā€,
ā€œhybridisesā€: ā€œhybridizesā€,
ā€œhybridisingā€: ā€œhybridizingā€,
ā€œhypnotiseā€: ā€œhypnotizeā€,
ā€œhypnotisedā€: ā€œhypnotizedā€,
ā€œhypnotisesā€: ā€œhypnotizesā€,
ā€œhypnotisingā€: ā€œhypnotizingā€,
ā€œhypothesiseā€: ā€œhypothesizeā€,
ā€œhypothesisedā€: ā€œhypothesizedā€,
ā€œhypothesisesā€: ā€œhypothesizesā€,
ā€œhypothesisingā€: ā€œhypothesizingā€,
ā€œidealisationā€: ā€œidealizationā€,
ā€œidealiseā€: ā€œidealizeā€,
ā€œidealisedā€: ā€œidealizedā€,
ā€œidealisesā€: ā€œidealizesā€,
ā€œidealisingā€: ā€œidealizingā€,
ā€œidoliseā€: ā€œidolizeā€,
ā€œidolisedā€: ā€œidolizedā€,
ā€œidolisesā€: ā€œidolizesā€,
ā€œidolisingā€: ā€œidolizingā€,
ā€œimmobilisationā€: ā€œimmobilizationā€,
ā€œimmobiliseā€: ā€œimmobilizeā€,
ā€œimmobilisedā€: ā€œimmobilizedā€,
ā€œimmobiliserā€: ā€œimmobilizerā€,
ā€œimmobilisersā€: ā€œimmobilizersā€,
ā€œimmobilisesā€: ā€œimmobilizesā€,
ā€œimmobilisingā€: ā€œimmobilizingā€,
ā€œimmortaliseā€: ā€œimmortalizeā€,
ā€œimmortalisedā€: ā€œimmortalizedā€,
ā€œimmortalisesā€: ā€œimmortalizesā€,
ā€œimmortalisingā€: ā€œimmortalizingā€,
ā€œimmunisationā€: ā€œimmunizationā€,
ā€œimmuniseā€: ā€œimmunizeā€,
ā€œimmunisedā€: ā€œimmunizedā€,
ā€œimmunisesā€: ā€œimmunizesā€,
ā€œimmunisingā€: ā€œimmunizingā€,
ā€œimpanelledā€: ā€œimpaneledā€,
ā€œimpanellingā€: ā€œimpanelingā€,
ā€œimperilledā€: ā€œimperiledā€,
ā€œimperillingā€: ā€œimperilingā€,
ā€œindividualiseā€: ā€œindividualizeā€,
ā€œindividualisedā€: ā€œindividualizedā€,
ā€œindividualisesā€: ā€œindividualizesā€,
ā€œindividualisingā€: ā€œindividualizingā€,
ā€œindustrialiseā€: ā€œindustrializeā€,
ā€œindustrialisedā€: ā€œindustrializedā€,
ā€œindustrialisesā€: ā€œindustrializesā€,
ā€œindustrialisingā€: ā€œindustrializingā€,
ā€œinflexionā€: ā€œinflectionā€,
ā€œinflexionsā€: ā€œinflectionsā€,
ā€œinitialiseā€: ā€œinitializeā€,
ā€œinitialisedā€: ā€œinitializedā€,
ā€œinitialisesā€: ā€œinitializesā€,
ā€œinitialisingā€: ā€œinitializingā€,
ā€œinitialledā€: ā€œinitialedā€,
ā€œinitiallingā€: ā€œinitialingā€,
ā€œinstalā€: ā€œinstallā€,
ā€œinstalmentā€: ā€œinstallmentā€,
ā€œinstalmentsā€: ā€œinstallmentsā€,
ā€œinstalsā€: ā€œinstallsā€,
ā€œinstilā€: ā€œinstillā€,
ā€œinstilsā€: ā€œinstillsā€,
ā€œinstitutionalisationā€: ā€œinstitutionalizationā€,
ā€œinstitutionaliseā€: ā€œinstitutionalizeā€,
ā€œinstitutionalisedā€: ā€œinstitutionalizedā€,
ā€œinstitutionalisesā€: ā€œinstitutionalizesā€,
ā€œinstitutionalisingā€: ā€œinstitutionalizingā€,
ā€œintellectualiseā€: ā€œintellectualizeā€,
ā€œintellectualisedā€: ā€œintellectualizedā€,
ā€œintellectualisesā€: ā€œintellectualizesā€,
ā€œintellectualisingā€: ā€œintellectualizingā€,
ā€œinternalisationā€: ā€œinternalizationā€,
ā€œinternaliseā€: ā€œinternalizeā€,
ā€œinternalisedā€: ā€œinternalizedā€,
ā€œinternalisesā€: ā€œinternalizesā€,
ā€œinternalisingā€: ā€œinternalizingā€,
ā€œinternationalisationā€: ā€œinternationalizationā€,
ā€œinternationaliseā€: ā€œinternationalizeā€,
ā€œinternationalisedā€: ā€œinternationalizedā€,
ā€œinternationalisesā€: ā€œinternationalizesā€,
ā€œinternationalisingā€: ā€œinternationalizingā€,
ā€œionisationā€: ā€œionizationā€,
ā€œioniseā€: ā€œionizeā€,
ā€œionisedā€: ā€œionizedā€,
ā€œioniserā€: ā€œionizerā€,
ā€œionisersā€: ā€œionizersā€,
ā€œionisesā€: ā€œionizesā€,
ā€œionisingā€: ā€œionizingā€,
ā€œitaliciseā€: ā€œitalicizeā€,
ā€œitalicisedā€: ā€œitalicizedā€,
ā€œitalicisesā€: ā€œitalicizesā€,
ā€œitalicisingā€: ā€œitalicizingā€,
ā€œitemiseā€: ā€œitemizeā€,
ā€œitemisedā€: ā€œitemizedā€,
ā€œitemisesā€: ā€œitemizesā€,
ā€œitemisingā€: ā€œitemizingā€,
ā€œjeopardiseā€: ā€œjeopardizeā€,
ā€œjeopardisedā€: ā€œjeopardizedā€,
ā€œjeopardisesā€: ā€œjeopardizesā€,
ā€œjeopardisingā€: ā€œjeopardizingā€,
ā€œjewelledā€: ā€œjeweledā€,
ā€œjewellerā€: ā€œjewelerā€,
ā€œjewellersā€: ā€œjewelersā€,
ā€œjewelleryā€: ā€œjewelryā€,
ā€œjudgementā€: ā€œjudgmentā€,
ā€œkilogrammeā€: ā€œkilogramā€,
ā€œkilogrammesā€: ā€œkilogramsā€,
ā€œkilometreā€: ā€œkilometerā€,
ā€œkilometresā€: ā€œkilometersā€,
ā€œlabelledā€: ā€œlabeledā€,
ā€œlabellingā€: ā€œlabelingā€,
ā€œlabourā€: ā€œlaborā€,
ā€œlabouredā€: ā€œlaboredā€,
ā€œlabourerā€: ā€œlaborerā€,
ā€œlabourersā€: ā€œlaborersā€,
ā€œlabouringā€: ā€œlaboringā€,
ā€œlaboursā€: ā€œlaborsā€,
ā€œlacklustreā€: ā€œlacklusterā€,
ā€œlegalisationā€: ā€œlegalizationā€,
ā€œlegaliseā€: ā€œlegalizeā€,
ā€œlegalisedā€: ā€œlegalizedā€,
ā€œlegalisesā€: ā€œlegalizesā€,
ā€œlegalisingā€: ā€œlegalizingā€,
ā€œlegitimiseā€: ā€œlegitimizeā€,
ā€œlegitimisedā€: ā€œlegitimizedā€,
ā€œlegitimisesā€: ā€œlegitimizesā€,
ā€œlegitimisingā€: ā€œlegitimizingā€,
ā€œleukaemiaā€: ā€œleukemiaā€,
ā€œlevelledā€: ā€œleveledā€,
ā€œlevellerā€: ā€œlevelerā€,
ā€œlevellersā€: ā€œlevelersā€,
ā€œlevellingā€: ā€œlevelingā€,
ā€œlibelledā€: ā€œlibeledā€,
ā€œlibellingā€: ā€œlibelingā€,
ā€œlibellousā€: ā€œlibelousā€,
ā€œliberalisationā€: ā€œliberalizationā€,
ā€œliberaliseā€: ā€œliberalizeā€,
ā€œliberalisedā€: ā€œliberalizedā€,
ā€œliberalisesā€: ā€œliberalizesā€,
ā€œliberalisingā€: ā€œliberalizingā€,
ā€œlicenceā€: ā€œlicenseā€,
ā€œlicencedā€: ā€œlicensedā€,
ā€œlicencesā€: ā€œlicensesā€,
ā€œlicencingā€: ā€œlicensingā€,
ā€œlikeableā€: ā€œlikableā€,
ā€œlionisationā€: ā€œlionizationā€,
ā€œlioniseā€: ā€œlionizeā€,
ā€œlionisedā€: ā€œlionizedā€,
ā€œlionisesā€: ā€œlionizesā€,
ā€œlionisingā€: ā€œlionizingā€,
ā€œliquidiseā€: ā€œliquidizeā€,
ā€œliquidisedā€: ā€œliquidizedā€,
ā€œliquidiserā€: ā€œliquidizerā€,
ā€œliquidisersā€: ā€œliquidizersā€,
ā€œliquidisesā€: ā€œliquidizesā€,
ā€œliquidisingā€: ā€œliquidizingā€,
ā€œlitreā€: ā€œliterā€,
ā€œlitresā€: ā€œlitersā€,
ā€œlocaliseā€: ā€œlocalizeā€,
ā€œlocalisedā€: ā€œlocalizedā€,
ā€œlocalisesā€: ā€œlocalizesā€,
ā€œlocalisingā€: ā€œlocalizingā€,
ā€œlouvreā€: ā€œlouverā€,
ā€œlouvredā€: ā€œlouveredā€,
ā€œlouvresā€: ā€œlouversā€,
ā€œlustreā€: ā€œlusterā€,
ā€œmagnetiseā€: ā€œmagnetizeā€,
ā€œmagnetisedā€: ā€œmagnetizedā€,
ā€œmagnetisesā€: ā€œmagnetizesā€,
ā€œmagnetisingā€: ā€œmagnetizingā€,
ā€œmanoeuvrabilityā€: ā€œmaneuverabilityā€,
ā€œmanoeuvrableā€: ā€œmaneuverableā€,
ā€œmanoeuvreā€: ā€œmaneuverā€,
ā€œmanoeuvredā€: ā€œmaneuveredā€,
ā€œmanoeuvresā€: ā€œmaneuversā€,
ā€œmanoeuvringā€: ā€œmaneuveringā€,
ā€œmanoeuvringsā€: ā€œmaneuveringsā€,
ā€œmarginalisationā€: ā€œmarginalizationā€,
ā€œmarginaliseā€: ā€œmarginalizeā€,
ā€œmarginalisedā€: ā€œmarginalizedā€,
ā€œmarginalisesā€: ā€œmarginalizesā€,
ā€œmarginalisingā€: ā€œmarginalizingā€,
ā€œmarshalledā€: ā€œmarshaledā€,
ā€œmarshallingā€: ā€œmarshalingā€,
ā€œmarvelledā€: ā€œmarveledā€,
ā€œmarvellingā€: ā€œmarvelingā€,
ā€œmarvellousā€: ā€œmarvelousā€,
ā€œmarvellouslyā€: ā€œmarvelouslyā€,
ā€œmaterialisationā€: ā€œmaterializationā€,
ā€œmaterialiseā€: ā€œmaterializeā€,
ā€œmaterialisedā€: ā€œmaterializedā€,
ā€œmaterialisesā€: ā€œmaterializesā€,
ā€œmaterialisingā€: ā€œmaterializingā€,
ā€œmaximisationā€: ā€œmaximizationā€,
ā€œmaximiseā€: ā€œmaximizeā€,
ā€œmaximisedā€: ā€œmaximizedā€,
ā€œmaximisesā€: ā€œmaximizesā€,
ā€œmaximisingā€: ā€œmaximizingā€,
ā€œmeagreā€: ā€œmeagerā€,
ā€œmechanisationā€: ā€œmechanizationā€,
ā€œmechaniseā€: ā€œmechanizeā€,
ā€œmechanisedā€: ā€œmechanizedā€,
ā€œmechanisesā€: ā€œmechanizesā€,
ā€œmechanisingā€: ā€œmechanizingā€,
ā€œmediaevalā€: ā€œmedievalā€,
ā€œmemorialiseā€: ā€œmemorializeā€,
ā€œmemorialisedā€: ā€œmemorializedā€,
ā€œmemorialisesā€: ā€œmemorializesā€,
ā€œmemorialisingā€: ā€œmemorializingā€,
ā€œmemoriseā€: ā€œmemorizeā€,
ā€œmemorisedā€: ā€œmemorizedā€,
ā€œmemorisesā€: ā€œmemorizesā€,
ā€œmemorisingā€: ā€œmemorizingā€,
ā€œmesmeriseā€: ā€œmesmerizeā€,
ā€œmesmerisedā€: ā€œmesmerizedā€,
ā€œmesmerisesā€: ā€œmesmerizesā€,
ā€œmesmerisingā€: ā€œmesmerizingā€,
ā€œmetaboliseā€: ā€œmetabolizeā€,
ā€œmetabolisedā€: ā€œmetabolizedā€,
ā€œmetabolisesā€: ā€œmetabolizesā€,
ā€œmetabolisingā€: ā€œmetabolizingā€,
ā€œmetreā€: ā€œmeterā€,
ā€œmetresā€: ā€œmetersā€,
ā€œmicrometreā€: ā€œmicrometerā€,
ā€œmicrometresā€: ā€œmicrometersā€,
ā€œmilitariseā€: ā€œmilitarizeā€,
ā€œmilitarisedā€: ā€œmilitarizedā€,
ā€œmilitarisesā€: ā€œmilitarizesā€,
ā€œmilitarisingā€: ā€œmilitarizingā€,
ā€œmilligrammeā€: ā€œmilligramā€,
ā€œmilligrammesā€: ā€œmilligramsā€,
ā€œmillilitreā€: ā€œmilliliterā€,
ā€œmillilitresā€: ā€œmillilitersā€,
ā€œmillimetreā€: ā€œmillimeterā€,
ā€œmillimetresā€: ā€œmillimetersā€,
ā€œminiaturisationā€: ā€œminiaturizationā€,
ā€œminiaturiseā€: ā€œminiaturizeā€,
ā€œminiaturisedā€: ā€œminiaturizedā€,
ā€œminiaturisesā€: ā€œminiaturizesā€,
ā€œminiaturisingā€: ā€œminiaturizingā€,
ā€œminibussesā€: ā€œminibusesā€,
ā€œminimiseā€: ā€œminimizeā€,
ā€œminimisedā€: ā€œminimizedā€,
ā€œminimisesā€: ā€œminimizesā€,
ā€œminimisingā€: ā€œminimizingā€,
ā€œmisbehaviourā€: ā€œmisbehaviorā€,
ā€œmisdemeanourā€: ā€œmisdemeanorā€,
ā€œmisdemeanoursā€: ā€œmisdemeanorsā€,
ā€œmisspeltā€: ā€œmisspelledā€,
ā€œmitreā€: ā€œmiterā€,
ā€œmitresā€: ā€œmitersā€,
ā€œmobilisationā€: ā€œmobilizationā€,
ā€œmobiliseā€: ā€œmobilizeā€,
ā€œmobilisedā€: ā€œmobilizedā€,
ā€œmobilisesā€: ā€œmobilizesā€,
ā€œmobilisingā€: ā€œmobilizingā€,
ā€œmodelledā€: ā€œmodeledā€,
ā€œmodellerā€: ā€œmodelerā€,
ā€œmodellersā€: ā€œmodelersā€,
ā€œmodellingā€: ā€œmodelingā€,
ā€œmoderniseā€: ā€œmodernizeā€,
ā€œmodernisedā€: ā€œmodernizedā€,
ā€œmodernisesā€: ā€œmodernizesā€,
ā€œmodernisingā€: ā€œmodernizingā€,
ā€œmoisturiseā€: ā€œmoisturizeā€,
ā€œmoisturisedā€: ā€œmoisturizedā€,
ā€œmoisturiserā€: ā€œmoisturizerā€,
ā€œmoisturisersā€: ā€œmoisturizersā€,
ā€œmoisturisesā€: ā€œmoisturizesā€,
ā€œmoisturisingā€: ā€œmoisturizingā€,
ā€œmonologueā€: ā€œmonologā€,
ā€œmonologuesā€: ā€œmonologsā€,
ā€œmonopolisationā€: ā€œmonopolizationā€,
ā€œmonopoliseā€: ā€œmonopolizeā€,
ā€œmonopolisedā€: ā€œmonopolizedā€,
ā€œmonopolisesā€: ā€œmonopolizesā€,
ā€œmonopolisingā€: ā€œmonopolizingā€,
ā€œmoraliseā€: ā€œmoralizeā€,
ā€œmoralisedā€: ā€œmoralizedā€,
ā€œmoralisesā€: ā€œmoralizesā€,
ā€œmoralisingā€: ā€œmoralizingā€,
ā€œmotorisedā€: ā€œmotorizedā€,
ā€œmouldā€: ā€œmoldā€,
ā€œmouldedā€: ā€œmoldedā€,
ā€œmoulderā€: ā€œmolderā€,
ā€œmoulderedā€: ā€œmolderedā€,
ā€œmoulderingā€: ā€œmolderingā€,
ā€œmouldersā€: ā€œmoldersā€,
ā€œmouldierā€: ā€œmoldierā€,
ā€œmouldiestā€: ā€œmoldiestā€,
ā€œmouldingā€: ā€œmoldingā€,
ā€œmouldingsā€: ā€œmoldingsā€,
ā€œmouldsā€: ā€œmoldsā€,
ā€œmouldyā€: ā€œmoldyā€,
ā€œmoultā€: ā€œmoltā€,
ā€œmoultedā€: ā€œmoltedā€,
ā€œmoultingā€: ā€œmoltingā€,
ā€œmoultsā€: ā€œmoltsā€,
ā€œmoustacheā€: ā€œmustacheā€,
ā€œmoustachedā€: ā€œmustachedā€,
ā€œmoustachesā€: ā€œmustachesā€,
ā€œmoustachioedā€: ā€œmustachioedā€,
ā€œmulticolouredā€: ā€œmulticoloredā€,
ā€œnationalisationā€: ā€œnationalizationā€,
ā€œnationalisationsā€: ā€œnationalizationsā€,
ā€œnationaliseā€: ā€œnationalizeā€,
ā€œnationalisedā€: ā€œnationalizedā€,
ā€œnationalisesā€: ā€œnationalizesā€,
ā€œnationalisingā€: ā€œnationalizingā€,
ā€œnaturalisationā€: ā€œnaturalizationā€,
ā€œnaturaliseā€: ā€œnaturalizeā€,
ā€œnaturalisedā€: ā€œnaturalizedā€,
ā€œnaturalisesā€: ā€œnaturalizesā€,
ā€œnaturalisingā€: ā€œnaturalizingā€,
ā€œneighbourā€: ā€œneighborā€,
ā€œneighbourhoodā€: ā€œneighborhoodā€,
ā€œneighbourhoodsā€: ā€œneighborhoodsā€,
ā€œneighbouringā€: ā€œneighboringā€,
ā€œneighbourlinessā€: ā€œneighborlinessā€,
ā€œneighbourlyā€: ā€œneighborlyā€,
ā€œneighboursā€: ā€œneighborsā€,
ā€œneutralisationā€: ā€œneutralizationā€,
ā€œneutraliseā€: ā€œneutralizeā€,
ā€œneutralisedā€: ā€œneutralizedā€,
ā€œneutralisesā€: ā€œneutralizesā€,
ā€œneutralisingā€: ā€œneutralizingā€,
ā€œnormalisationā€: ā€œnormalizationā€,
ā€œnormaliseā€: ā€œnormalizeā€,
ā€œnormalisedā€: ā€œnormalizedā€,
ā€œnormalisesā€: ā€œnormalizesā€,
ā€œnormalisingā€: ā€œnormalizingā€,
ā€œodourā€: ā€œodorā€,
ā€œodourlessā€: ā€œodorlessā€,
ā€œodoursā€: ā€œodorsā€,
ā€œoesophagusā€: ā€œesophagusā€,
ā€œoesophagusesā€: ā€œesophagusesā€,
ā€œoestrogenā€: ā€œestrogenā€,
ā€œoffenceā€: ā€œoffenseā€,
ā€œoffencesā€: ā€œoffensesā€,
ā€œomeletteā€: ā€œomeletā€,
ā€œomelettesā€: ā€œomeletsā€,
ā€œoptimiseā€: ā€œoptimizeā€,
ā€œoptimisedā€: ā€œoptimizedā€,
ā€œoptimisesā€: ā€œoptimizesā€,
ā€œoptimisingā€: ā€œoptimizingā€,
ā€œorganisationā€: ā€œorganizationā€,
ā€œorganisationalā€: ā€œorganizationalā€,
ā€œorganisationsā€: ā€œorganizationsā€,
ā€œorganiseā€: ā€œorganizeā€,
ā€œorganisedā€: ā€œorganizedā€,
ā€œorganiserā€: ā€œorganizerā€,
ā€œorganisersā€: ā€œorganizersā€,
ā€œorganisesā€: ā€œorganizesā€,
ā€œorganisingā€: ā€œorganizingā€,
ā€œorthopaedicā€: ā€œorthopedicā€,
ā€œorthopaedicsā€: ā€œorthopedicsā€,
ā€œostraciseā€: ā€œostracizeā€,
ā€œostracisedā€: ā€œostracizedā€,
ā€œostracisesā€: ā€œostracizesā€,
ā€œostracisingā€: ā€œostracizingā€,
ā€œoutmanoeuvreā€: ā€œoutmaneuverā€,
ā€œoutmanoeuvredā€: ā€œoutmaneuveredā€,
ā€œoutmanoeuvresā€: ā€œoutmaneuversā€,
ā€œoutmanoeuvringā€: ā€œoutmaneuveringā€,
ā€œoveremphasiseā€: ā€œoveremphasizeā€,
ā€œoveremphasisedā€: ā€œoveremphasizedā€,
ā€œoveremphasisesā€: ā€œoveremphasizesā€,
ā€œoveremphasisingā€: ā€œoveremphasizingā€,
ā€œoxidisationā€: ā€œoxidizationā€,
ā€œoxidiseā€: ā€œoxidizeā€,
ā€œoxidisedā€: ā€œoxidizedā€,
ā€œoxidisesā€: ā€œoxidizesā€,
ā€œoxidisingā€: ā€œoxidizingā€,
ā€œpaederastā€: ā€œpederastā€,
ā€œpaederastsā€: ā€œpederastsā€,
ā€œpaediatricā€: ā€œpediatricā€,
ā€œpaediatricianā€: ā€œpediatricianā€,
ā€œpaediatriciansā€: ā€œpediatriciansā€,
ā€œpaediatricsā€: ā€œpediatricsā€,
ā€œpaedophileā€: ā€œpedophileā€,
ā€œpaedophilesā€: ā€œpedophilesā€,
ā€œpaedophiliaā€: ā€œpedophiliaā€,
ā€œpalaeolithicā€: ā€œpaleolithicā€,
ā€œpalaeontologistā€: ā€œpaleontologistā€,
ā€œpalaeontologistsā€: ā€œpaleontologistsā€,
ā€œpalaeontologyā€: ā€œpaleontologyā€,
ā€œpanelledā€: ā€œpaneledā€,
ā€œpanellingā€: ā€œpanelingā€,
ā€œpanellistā€: ā€œpanelistā€,
ā€œpanellistsā€: ā€œpanelistsā€,
ā€œparalyseā€: ā€œparalyzeā€,
ā€œparalysedā€: ā€œparalyzedā€,
ā€œparalysesā€: ā€œparalyzesā€,
ā€œparalysingā€: ā€œparalyzingā€,
ā€œparcelledā€: ā€œparceledā€,
ā€œparcellingā€: ā€œparcelingā€,
ā€œparlourā€: ā€œparlorā€,
ā€œparloursā€: ā€œparlorsā€,
ā€œparticulariseā€: ā€œparticularizeā€,
ā€œparticularisedā€: ā€œparticularizedā€,
ā€œparticularisesā€: ā€œparticularizesā€,
ā€œparticularisingā€: ā€œparticularizingā€,
ā€œpassivisationā€: ā€œpassivizationā€,
ā€œpassiviseā€: ā€œpassivizeā€,
ā€œpassivisedā€: ā€œpassivizedā€,
ā€œpassivisesā€: ā€œpassivizesā€,
ā€œpassivisingā€: ā€œpassivizingā€,
ā€œpasteurisationā€: ā€œpasteurizationā€,
ā€œpasteuriseā€: ā€œpasteurizeā€,
ā€œpasteurisedā€: ā€œpasteurizedā€,
ā€œpasteurisesā€: ā€œpasteurizesā€,
ā€œpasteurisingā€: ā€œpasteurizingā€,
ā€œpatroniseā€: ā€œpatronizeā€,
ā€œpatronisedā€: ā€œpatronizedā€,
ā€œpatronisesā€: ā€œpatronizesā€,
ā€œpatronisingā€: ā€œpatronizingā€,
ā€œpatronisinglyā€: ā€œpatronizinglyā€,
ā€œpedalledā€: ā€œpedaledā€,
ā€œpedallingā€: ā€œpedalingā€,
ā€œpedestrianisationā€: ā€œpedestrianizationā€,
ā€œpedestrianiseā€: ā€œpedestrianizeā€,
ā€œpedestrianisedā€: ā€œpedestrianizedā€,
ā€œpedestrianisesā€: ā€œpedestrianizesā€,
ā€œpedestrianisingā€: ā€œpedestrianizingā€,
ā€œpenaliseā€: ā€œpenalizeā€,
ā€œpenalisedā€: ā€œpenalizedā€,
ā€œpenalisesā€: ā€œpenalizesā€,
ā€œpenalisingā€: ā€œpenalizingā€,
ā€œpencilledā€: ā€œpenciledā€,
ā€œpencillingā€: ā€œpencilingā€,
ā€œpersonaliseā€: ā€œpersonalizeā€,
ā€œpersonalisedā€: ā€œpersonalizedā€,
ā€œpersonalisesā€: ā€œpersonalizesā€,
ā€œpersonalisingā€: ā€œpersonalizingā€,
ā€œpharmacopoeiaā€: ā€œpharmacopeiaā€,
ā€œpharmacopoeiasā€: ā€œpharmacopeiasā€,
ā€œphilosophiseā€: ā€œphilosophizeā€,
ā€œphilosophisedā€: ā€œphilosophizedā€,
ā€œphilosophisesā€: ā€œphilosophizesā€,
ā€œphilosophisingā€: ā€œphilosophizingā€,
ā€œphiltreā€: ā€œfilterā€,
ā€œphiltresā€: ā€œfiltersā€,
ā€œphoneyā€: ā€œphonyā€,
ā€œplagiariseā€: ā€œplagiarizeā€,
ā€œplagiarisedā€: ā€œplagiarizedā€,
ā€œplagiarisesā€: ā€œplagiarizesā€,
ā€œplagiarisingā€: ā€œplagiarizingā€,
ā€œploughā€: ā€œplowā€,
ā€œploughedā€: ā€œplowedā€,
ā€œploughingā€: ā€œplowingā€,
ā€œploughmanā€: ā€œplowmanā€,
ā€œploughmenā€: ā€œplowmenā€,
ā€œploughsā€: ā€œplowsā€,
ā€œploughshareā€: ā€œplowshareā€,
ā€œploughsharesā€: ā€œplowsharesā€,
ā€œpolarisationā€: ā€œpolarizationā€,
ā€œpolariseā€: ā€œpolarizeā€,
ā€œpolarisedā€: ā€œpolarizedā€,
ā€œpolarisesā€: ā€œpolarizesā€,
ā€œpolarisingā€: ā€œpolarizingā€,
ā€œpoliticisationā€: ā€œpoliticizationā€,
ā€œpoliticiseā€: ā€œpoliticizeā€,
ā€œpoliticisedā€: ā€œpoliticizedā€,
ā€œpoliticisesā€: ā€œpoliticizesā€,
ā€œpoliticisingā€: ā€œpoliticizingā€,
ā€œpopularisationā€: ā€œpopularizationā€,
ā€œpopulariseā€: ā€œpopularizeā€,
ā€œpopularisedā€: ā€œpopularizedā€,
ā€œpopularisesā€: ā€œpopularizesā€,
ā€œpopularisingā€: ā€œpopularizingā€,
ā€œpouffeā€: ā€œpoufā€,
ā€œpouffesā€: ā€œpoufsā€,
ā€œpractiseā€: ā€œpracticeā€,
ā€œpractisedā€: ā€œpracticedā€,
ā€œpractisesā€: ā€œpracticesā€,
ā€œpractisingā€: ā€œpracticingā€,
ā€œpraesidiumā€: ā€œpresidiumā€,
ā€œpraesidiumsā€: ā€œpresidiumsā€,
ā€œpressurisationā€: ā€œpressurizationā€,
ā€œpressuriseā€: ā€œpressurizeā€,
ā€œpressurisedā€: ā€œpressurizedā€,
ā€œpressurisesā€: ā€œpressurizesā€,
ā€œpressurisingā€: ā€œpressurizingā€,
ā€œpretenceā€: ā€œpretenseā€,
ā€œpretencesā€: ā€œpretensesā€,
ā€œprimaevalā€: ā€œprimevalā€,
ā€œprioritisationā€: ā€œprioritizationā€,
ā€œprioritiseā€: ā€œprioritizeā€,
ā€œprioritisedā€: ā€œprioritizedā€,
ā€œprioritisesā€: ā€œprioritizesā€,
ā€œprioritisingā€: ā€œprioritizingā€,
ā€œprivatisationā€: ā€œprivatizationā€,
ā€œprivatisationsā€: ā€œprivatizationsā€,
ā€œprivatiseā€: ā€œprivatizeā€,
ā€œprivatisedā€: ā€œprivatizedā€,
ā€œprivatisesā€: ā€œprivatizesā€,
ā€œprivatisingā€: ā€œprivatizingā€,
ā€œprofessionalisationā€: ā€œprofessionalizationā€,
ā€œprofessionaliseā€: ā€œprofessionalizeā€,
ā€œprofessionalisedā€: ā€œprofessionalizedā€,
ā€œprofessionalisesā€: ā€œprofessionalizesā€,
ā€œprofessionalisingā€: ā€œprofessionalizingā€,
ā€œprogrammeā€: ā€œprogramā€,
ā€œprogrammesā€: ā€œprogramsā€,
ā€œprologueā€: ā€œprologā€,
ā€œprologuesā€: ā€œprologsā€,
ā€œpropagandiseā€: ā€œpropagandizeā€,
ā€œpropagandisedā€: ā€œpropagandizedā€,
ā€œpropagandisesā€: ā€œpropagandizesā€,
ā€œpropagandisingā€: ā€œpropagandizingā€,
ā€œproselytiseā€: ā€œproselytizeā€,
ā€œproselytisedā€: ā€œproselytizedā€,
ā€œproselytiserā€: ā€œproselytizerā€,
ā€œproselytisersā€: ā€œproselytizersā€,
ā€œproselytisesā€: ā€œproselytizesā€,
ā€œproselytisingā€: ā€œproselytizingā€,
ā€œpsychoanalyseā€: ā€œpsychoanalyzeā€,
ā€œpsychoanalysedā€: ā€œpsychoanalyzedā€,
ā€œpsychoanalysesā€: ā€œpsychoanalyzesā€,
ā€œpsychoanalysingā€: ā€œpsychoanalyzingā€,
ā€œpubliciseā€: ā€œpublicizeā€,
ā€œpublicisedā€: ā€œpublicizedā€,
ā€œpublicisesā€: ā€œpublicizesā€,
ā€œpublicisingā€: ā€œpublicizingā€,
ā€œpulverisationā€: ā€œpulverizationā€,
ā€œpulveriseā€: ā€œpulverizeā€,
ā€œpulverisedā€: ā€œpulverizedā€,
ā€œpulverisesā€: ā€œpulverizesā€,
ā€œpulverisingā€: ā€œpulverizingā€,
ā€œpummelledā€: ā€œpummelā€,
ā€œpummellingā€: ā€œpummeledā€,
ā€œpyjamaā€: ā€œpajamaā€,
ā€œpyjamasā€: ā€œpajamasā€,
ā€œpzazzā€: ā€œpizzazzā€,
ā€œquarrelledā€: ā€œquarreledā€,
ā€œquarrellingā€: ā€œquarrelingā€,
ā€œradicaliseā€: ā€œradicalizeā€,
ā€œradicalisedā€: ā€œradicalizedā€,
ā€œradicalisesā€: ā€œradicalizesā€,
ā€œradicalisingā€: ā€œradicalizingā€,
ā€œrancourā€: ā€œrancorā€,
ā€œrandomiseā€: ā€œrandomizeā€,
ā€œrandomisedā€: ā€œrandomizedā€,
ā€œrandomisesā€: ā€œrandomizesā€,
ā€œrandomisingā€: ā€œrandomizingā€,
ā€œrationalisationā€: ā€œrationalizationā€,
ā€œrationalisationsā€: ā€œrationalizationsā€,
ā€œrationaliseā€: ā€œrationalizeā€,
ā€œrationalisedā€: ā€œrationalizedā€,
ā€œrationalisesā€: ā€œrationalizesā€,
ā€œrationalisingā€: ā€œrationalizingā€,
ā€œravelledā€: ā€œraveledā€,
ā€œravellingā€: ā€œravelingā€,
ā€œrealisableā€: ā€œrealizableā€,
ā€œrealisationā€: ā€œrealizationā€,
ā€œrealisationsā€: ā€œrealizationsā€,
ā€œrealiseā€: ā€œrealizeā€,
ā€œrealisedā€: ā€œrealizedā€,
ā€œrealisesā€: ā€œrealizesā€,
ā€œrealisingā€: ā€œrealizingā€,
ā€œrecognisableā€: ā€œrecognizableā€,
ā€œrecognisablyā€: ā€œrecognizablyā€,
ā€œrecognisanceā€: ā€œrecognizanceā€,
ā€œrecogniseā€: ā€œrecognizeā€,
ā€œrecognisedā€: ā€œrecognizedā€,
ā€œrecognisesā€: ā€œrecognizesā€,
ā€œrecognisingā€: ā€œrecognizingā€,
ā€œreconnoitreā€: ā€œreconnoiterā€,
ā€œreconnoitredā€: ā€œreconnoiteredā€,
ā€œreconnoitresā€: ā€œreconnoitersā€,
ā€œreconnoitringā€: ā€œreconnoiteringā€,
ā€œrefuelledā€: ā€œrefueledā€,
ā€œrefuellingā€: ā€œrefuelingā€,
ā€œregularisationā€: ā€œregularizationā€,
ā€œregulariseā€: ā€œregularizeā€,
ā€œregularisedā€: ā€œregularizedā€,
ā€œregularisesā€: ā€œregularizesā€,
ā€œregularisingā€: ā€œregularizingā€,
ā€œremodelledā€: ā€œremodeledā€,
ā€œremodellingā€: ā€œremodelingā€,
ā€œremouldā€: ā€œremoldā€,
ā€œremouldedā€: ā€œremoldedā€,
ā€œremouldingā€: ā€œremoldingā€,
ā€œremouldsā€: ā€œremoldsā€,
ā€œreorganisationā€: ā€œreorganizationā€,
ā€œreorganisationsā€: ā€œreorganizationsā€,
ā€œreorganiseā€: ā€œreorganizeā€,
ā€œreorganisedā€: ā€œreorganizedā€,
ā€œreorganisesā€: ā€œreorganizesā€,
ā€œreorganisingā€: ā€œreorganizingā€,
ā€œrevelledā€: ā€œreveledā€,
ā€œrevellerā€: ā€œrevelerā€,
ā€œrevellersā€: ā€œrevelersā€,
ā€œrevellingā€: ā€œrevelingā€,
ā€œrevitaliseā€: ā€œrevitalizeā€,
ā€œrevitalisedā€: ā€œrevitalizedā€,
ā€œrevitalisesā€: ā€œrevitalizesā€,
ā€œrevitalisingā€: ā€œrevitalizingā€,
ā€œrevolutioniseā€: ā€œrevolutionizeā€,
ā€œrevolutionisedā€: ā€œrevolutionizedā€,
ā€œrevolutionisesā€: ā€œrevolutionizesā€,
ā€œrevolutionisingā€: ā€œrevolutionizingā€,
ā€œrhapsodiseā€: ā€œrhapsodizeā€,
ā€œrhapsodisedā€: ā€œrhapsodizedā€,
ā€œrhapsodisesā€: ā€œrhapsodizesā€,
ā€œrhapsodisingā€: ā€œrhapsodizingā€,
ā€œrigourā€: ā€œrigorā€,
ā€œrigoursā€: ā€œrigorsā€,
ā€œritualisedā€: ā€œritualizedā€,
ā€œrivalledā€: ā€œrivaledā€,
ā€œrivallingā€: ā€œrivalingā€,
ā€œromanticiseā€: ā€œromanticizeā€,
ā€œromanticisedā€: ā€œromanticizedā€,
ā€œromanticisesā€: ā€œromanticizesā€,
ā€œromanticisingā€: ā€œromanticizingā€,
ā€œrumourā€: ā€œrumorā€,
ā€œrumouredā€: ā€œrumoredā€,
ā€œrumoursā€: ā€œrumorsā€,
ā€œsabreā€: ā€œsaberā€,
ā€œsabresā€: ā€œsabersā€,
ā€œsaltpetreā€: ā€œsaltpeterā€,
ā€œsanitiseā€: ā€œsanitizeā€,
ā€œsanitisedā€: ā€œsanitizedā€,
ā€œsanitisesā€: ā€œsanitizesā€,
ā€œsanitisingā€: ā€œsanitizingā€,
ā€œsatiriseā€: ā€œsatirizeā€,
ā€œsatirisedā€: ā€œsatirizedā€,
ā€œsatirisesā€: ā€œsatirizesā€,
ā€œsatirisingā€: ā€œsatirizingā€,
ā€œsaviourā€: ā€œsaviorā€,
ā€œsavioursā€: ā€œsaviorsā€,
ā€œsavourā€: ā€œsavorā€,
ā€œsavouredā€: ā€œsavoredā€,
ā€œsavouriesā€: ā€œsavoriesā€,
ā€œsavouringā€: ā€œsavoringā€,
ā€œsavoursā€: ā€œsavorsā€,
ā€œsavouryā€: ā€œsavoryā€,
ā€œscandaliseā€: ā€œscandalizeā€,
ā€œscandalisedā€: ā€œscandalizedā€,
ā€œscandalisesā€: ā€œscandalizesā€,
ā€œscandalisingā€: ā€œscandalizingā€,
ā€œscepticā€: ā€œskepticā€,
ā€œscepticalā€: ā€œskepticalā€,
ā€œscepticallyā€: ā€œskepticallyā€,
ā€œscepticismā€: ā€œskepticismā€,
ā€œscepticsā€: ā€œskepticsā€,
ā€œsceptreā€: ā€œscepterā€,
ā€œsceptresā€: ā€œsceptersā€,
ā€œscrutiniseā€: ā€œscrutinizeā€,
ā€œscrutinisedā€: ā€œscrutinizedā€,
ā€œscrutinisesā€: ā€œscrutinizesā€,
ā€œscrutinisingā€: ā€œscrutinizingā€,
ā€œsecularisationā€: ā€œsecularizationā€,
ā€œseculariseā€: ā€œsecularizeā€,
ā€œsecularisedā€: ā€œsecularizedā€,
ā€œsecularisesā€: ā€œsecularizesā€,
ā€œsecularisingā€: ā€œsecularizingā€,
ā€œsensationaliseā€: ā€œsensationalizeā€,
ā€œsensationalisedā€: ā€œsensationalizedā€,
ā€œsensationalisesā€: ā€œsensationalizesā€,
ā€œsensationalisingā€: ā€œsensationalizingā€,
ā€œsensitiseā€: ā€œsensitizeā€,
ā€œsensitisedā€: ā€œsensitizedā€,
ā€œsensitisesā€: ā€œsensitizesā€,
ā€œsensitisingā€: ā€œsensitizingā€,
ā€œsentimentaliseā€: ā€œsentimentalizeā€,
ā€œsentimentalisedā€: ā€œsentimentalizedā€,
ā€œsentimentalisesā€: ā€œsentimentalizesā€,
ā€œsentimentalisingā€: ā€œsentimentalizingā€,
ā€œsepulchreā€: ā€œsepulcherā€,
ā€œsepulchresā€: ā€œsepulchersā€,
ā€œserialisationā€: ā€œserializationā€,
ā€œserialisationsā€: ā€œserializationsā€,
ā€œserialiseā€: ā€œserializeā€,
ā€œserialisedā€: ā€œserializedā€,
ā€œserialisesā€: ā€œserializesā€,
ā€œserialisingā€: ā€œserializingā€,
ā€œsermoniseā€: ā€œsermonizeā€,
ā€œsermonisedā€: ā€œsermonizedā€,
ā€œsermonisesā€: ā€œsermonizesā€,
ā€œsermonisingā€: ā€œsermonizingā€,
ā€œsheikhā€: ā€œsheikā€,
ā€œshovelledā€: ā€œshoveledā€,
ā€œshovellingā€: ā€œshovelingā€,
ā€œshrivelledā€: ā€œshriveledā€,
ā€œshrivellingā€: ā€œshrivelingā€,
ā€œsignaliseā€: ā€œsignalizeā€,
ā€œsignalisedā€: ā€œsignalizedā€,
ā€œsignalisesā€: ā€œsignalizesā€,
ā€œsignalisingā€: ā€œsignalizingā€,
ā€œsignalledā€: ā€œsignaledā€,
ā€œsignallingā€: ā€œsignalingā€,
ā€œsmoulderā€: ā€œsmolderā€,
ā€œsmoulderedā€: ā€œsmolderedā€,
ā€œsmoulderingā€: ā€œsmolderingā€,
ā€œsmouldersā€: ā€œsmoldersā€,
ā€œsnivelledā€: ā€œsniveledā€,
ā€œsnivellingā€: ā€œsnivelingā€,
ā€œsnorkelledā€: ā€œsnorkeledā€,
ā€œsnorkellingā€: ā€œsnorkelingā€,
ā€œsnowploughā€: ā€œsnowplowā€,
ā€œsnowploughsā€: ā€œsnowplowā€,
ā€œsocialisationā€: ā€œsocializationā€,
ā€œsocialiseā€: ā€œsocializeā€,
ā€œsocialisedā€: ā€œsocializedā€,
ā€œsocialisesā€: ā€œsocializesā€,
ā€œsocialisingā€: ā€œsocializingā€,
ā€œsodomiseā€: ā€œsodomizeā€,
ā€œsodomisedā€: ā€œsodomizedā€,
ā€œsodomisesā€: ā€œsodomizesā€,
ā€œsodomisingā€: ā€œsodomizingā€,
ā€œsolemniseā€: ā€œsolemnizeā€,
ā€œsolemnisedā€: ā€œsolemnizedā€,
ā€œsolemnisesā€: ā€œsolemnizesā€,
ā€œsolemnisingā€: ā€œsolemnizingā€,
ā€œsombreā€: ā€œsomberā€,
ā€œspecialisationā€: ā€œspecializationā€,
ā€œspecialisationsā€: ā€œspecializationsā€,
ā€œspecialiseā€: ā€œspecializeā€,
ā€œspecialisedā€: ā€œspecializedā€,
ā€œspecialisesā€: ā€œspecializesā€,
ā€œspecialisingā€: ā€œspecializingā€,
ā€œspectreā€: ā€œspecterā€,
ā€œspectresā€: ā€œspectersā€,
ā€œspiralledā€: ā€œspiraledā€,
ā€œspirallingā€: ā€œspiralingā€,
ā€œsplendourā€: ā€œsplendorā€,
ā€œsplendoursā€: ā€œsplendorsā€,
ā€œsquirrelledā€: ā€œsquirreledā€,
ā€œsquirrellingā€: ā€œsquirrelingā€,
ā€œstabilisationā€: ā€œstabilizationā€,
ā€œstabiliseā€: ā€œstabilizeā€,
ā€œstabilisedā€: ā€œstabilizedā€,
ā€œstabiliserā€: ā€œstabilizerā€,
ā€œstabilisersā€: ā€œstabilizersā€,
ā€œstabilisesā€: ā€œstabilizesā€,
ā€œstabilisingā€: ā€œstabilizingā€,
ā€œstandardisationā€: ā€œstandardizationā€,
ā€œstandardiseā€: ā€œstandardizeā€,
ā€œstandardisedā€: ā€œstandardizedā€,
ā€œstandardisesā€: ā€œstandardizesā€,
ā€œstandardisingā€: ā€œstandardizingā€,
ā€œstencilledā€: ā€œstenciledā€,
ā€œstencillingā€: ā€œstencilingā€,
ā€œsterilisationā€: ā€œsterilizationā€,
ā€œsterilisationsā€: ā€œsterilizationsā€,
ā€œsteriliseā€: ā€œsterilizeā€,
ā€œsterilisedā€: ā€œsterilizedā€,
ā€œsteriliserā€: ā€œsterilizerā€,
ā€œsterilisersā€: ā€œsterilizersā€,
ā€œsterilisesā€: ā€œsterilizesā€,
ā€œsterilisingā€: ā€œsterilizingā€,
ā€œstigmatisationā€: ā€œstigmatizationā€,
ā€œstigmatiseā€: ā€œstigmatizeā€,
ā€œstigmatisedā€: ā€œstigmatizedā€,
ā€œstigmatisesā€: ā€œstigmatizesā€,
ā€œstigmatisingā€: ā€œstigmatizingā€,
ā€œstoreyā€: ā€œstoryā€,
ā€œstoreysā€: ā€œstoriesā€,
ā€œsubsidisationā€: ā€œsubsidizationā€,
ā€œsubsidiseā€: ā€œsubsidizeā€,
ā€œsubsidisedā€: ā€œsubsidizedā€,
ā€œsubsidiserā€: ā€œsubsidizerā€,
ā€œsubsidisersā€: ā€œsubsidizersā€,
ā€œsubsidisesā€: ā€œsubsidizesā€,
ā€œsubsidisingā€: ā€œsubsidizingā€,
ā€œsuccourā€: ā€œsuccorā€,
ā€œsuccouredā€: ā€œsuccoredā€,
ā€œsuccouringā€: ā€œsuccoringā€,
ā€œsuccoursā€: ā€œsuccorsā€,
ā€œsulphateā€: ā€œsulfateā€,
ā€œsulphatesā€: ā€œsulfatesā€,
ā€œsulphideā€: ā€œsulfideā€,
ā€œsulphidesā€: ā€œsulfidesā€,
ā€œsulphurā€: ā€œsulfurā€,
ā€œsulphurousā€: ā€œsulfurousā€,
ā€œsummariseā€: ā€œsummarizeā€,
ā€œsummarisedā€: ā€œsummarizedā€,
ā€œsummarisesā€: ā€œsummarizesā€,
ā€œsummarisingā€: ā€œsummarizingā€,
ā€œswivelledā€: ā€œswiveledā€,
ā€œswivellingā€: ā€œswivelingā€,
ā€œsymboliseā€: ā€œsymbolizeā€,
ā€œsymbolisedā€: ā€œsymbolizedā€,
ā€œsymbolisesā€: ā€œsymbolizesā€,
ā€œsymbolisingā€: ā€œsymbolizingā€,
ā€œsympathiseā€: ā€œsympathizeā€,
ā€œsympathisedā€: ā€œsympathizedā€,
ā€œsympathiserā€: ā€œsympathizerā€,
ā€œsympathisersā€: ā€œsympathizersā€,
ā€œsympathisesā€: ā€œsympathizesā€,
ā€œsympathisingā€: ā€œsympathizingā€,
ā€œsynchronisationā€: ā€œsynchronizationā€,
ā€œsynchroniseā€: ā€œsynchronizeā€,
ā€œsynchronisedā€: ā€œsynchronizedā€,
ā€œsynchronisesā€: ā€œsynchronizesā€,
ā€œsynchronisingā€: ā€œsynchronizingā€,
ā€œsynthesiseā€: ā€œsynthesizeā€,
ā€œsynthesisedā€: ā€œsynthesizedā€,
ā€œsynthesiserā€: ā€œsynthesizerā€,
ā€œsynthesisersā€: ā€œsynthesizersā€,
ā€œsynthesisesā€: ā€œsynthesizesā€,
ā€œsynthesisingā€: ā€œsynthesizingā€,
ā€œsyphonā€: ā€œsiphonā€,
ā€œsyphonedā€: ā€œsiphonedā€,
ā€œsyphoningā€: ā€œsiphoningā€,
ā€œsyphonsā€: ā€œsiphonsā€,
ā€œsystematisationā€: ā€œsystematizationā€,
ā€œsystematiseā€: ā€œsystematizeā€,
ā€œsystematisedā€: ā€œsystematizedā€,
ā€œsystematisesā€: ā€œsystematizesā€,
ā€œsystematisingā€: ā€œsystematizingā€,
ā€œtantaliseā€: ā€œtantalizeā€,
ā€œtantalisedā€: ā€œtantalizedā€,
ā€œtantalisesā€: ā€œtantalizesā€,
ā€œtantalisingā€: ā€œtantalizingā€,
ā€œtantalisinglyā€: ā€œtantalizinglyā€,
ā€œtasselledā€: ā€œtasseledā€,
ā€œtechnicolourā€: ā€œtechnicolorā€,
ā€œtemporiseā€: ā€œtemporizeā€,
ā€œtemporisedā€: ā€œtemporizedā€,
ā€œtemporisesā€: ā€œtemporizesā€,
ā€œtemporisingā€: ā€œtemporizingā€,
ā€œtenderiseā€: ā€œtenderizeā€,
ā€œtenderisedā€: ā€œtenderizedā€,
ā€œtenderisesā€: ā€œtenderizesā€,
ā€œtenderisingā€: ā€œtenderizingā€,
ā€œterroriseā€: ā€œterrorizeā€,
ā€œterrorisedā€: ā€œterrorizedā€,
ā€œterrorisesā€: ā€œterrorizesā€,
ā€œterrorisingā€: ā€œterrorizingā€,
ā€œtheatreā€: ā€œtheaterā€,
ā€œtheatregoerā€: ā€œtheatergoerā€,
ā€œtheatregoersā€: ā€œtheatergoersā€,
ā€œtheatresā€: ā€œtheatersā€,
ā€œtheoriseā€: ā€œtheorizeā€,
ā€œtheorisedā€: ā€œtheorizedā€,
ā€œtheorisesā€: ā€œtheorizesā€,
ā€œtheorisingā€: ā€œtheorizingā€,
ā€œtonneā€: ā€œtonā€,
ā€œtonnesā€: ā€œtonsā€,
ā€œtowelledā€: ā€œtoweledā€,
ā€œtowellingā€: ā€œtowelingā€,
ā€œtoxaemiaā€: ā€œtoxemiaā€,
ā€œtranquilliseā€: ā€œtranquilizeā€,
ā€œtranquillisedā€: ā€œtranquilizedā€,
ā€œtranquilliserā€: ā€œtranquilizerā€,
ā€œtranquillisersā€: ā€œtranquilizersā€,
ā€œtranquillisesā€: ā€œtranquilizesā€,
ā€œtranquillisingā€: ā€œtranquilizingā€,
ā€œtranquillityā€: ā€œtranquilityā€,
ā€œtranquillizeā€: ā€œtranquilizeā€,
ā€œtranquillizedā€: ā€œtranquilizedā€,
ā€œtranquillizerā€: ā€œtranquilizerā€,
ā€œtranquillizersā€: ā€œtranquilizersā€,
ā€œtranquillizesā€: ā€œtranquilizesā€,
ā€œtranquillizingā€: ā€œtranquilizingā€,
ā€œtranquillyā€: ā€œtranquilityā€,
ā€œtransistorisedā€: ā€œtransistorizedā€,
ā€œtraumatiseā€: ā€œtraumatizeā€,
ā€œtraumatisedā€: ā€œtraumatizedā€,
ā€œtraumatisesā€: ā€œtraumatizesā€,
ā€œtraumatisingā€: ā€œtraumatizingā€,
ā€œtravelledā€: ā€œtraveledā€,
ā€œtravellerā€: ā€œtravelerā€,
ā€œtravellersā€: ā€œtravelersā€,
ā€œtravellingā€: ā€œtravelingā€,
ā€œtravelogā€: ā€œtravelogueā€,
ā€œtravelogsā€: ā€œtraveloguesā€,
ā€œtrialledā€: ā€œtrialedā€,
ā€œtriallingā€: ā€œtrialingā€,
ā€œtricolourā€: ā€œtricolorā€,
ā€œtricoloursā€: ā€œtricolorsā€,
ā€œtrivialiseā€: ā€œtrivializeā€,
ā€œtrivialisedā€: ā€œtrivializedā€,
ā€œtrivialisesā€: ā€œtrivializesā€,
ā€œtrivialisingā€: ā€œtrivializingā€,
ā€œtumourā€: ā€œtumorā€,
ā€œtumoursā€: ā€œtumorsā€,
ā€œtunnelledā€: ā€œtunneledā€,
ā€œtunnellingā€: ā€œtunnelingā€,
ā€œtyranniseā€: ā€œtyrannizeā€,
ā€œtyrannisedā€: ā€œtyrannizedā€,
ā€œtyrannisesā€: ā€œtyrannizesā€,
ā€œtyrannisingā€: ā€œtyrannizingā€,
ā€œtyreā€: ā€œtireā€,
ā€œtyresā€: ā€œtiresā€,
ā€œunauthorisedā€: ā€œunauthorizedā€,
ā€œuncivilisedā€: ā€œuncivilizedā€,
ā€œunderutilisedā€: ā€œunderutilizedā€,
ā€œunequalledā€: ā€œunequaledā€,
ā€œunfavourableā€: ā€œunfavorableā€,
ā€œunfavourablyā€: ā€œunfavorablyā€,
ā€œunionisationā€: ā€œunionizationā€,
ā€œunioniseā€: ā€œunionizeā€,
ā€œunionisedā€: ā€œunionizedā€,
ā€œunionisesā€: ā€œunionizesā€,
ā€œunionisingā€: ā€œunionizingā€,
ā€œunorganisedā€: ā€œunorganizedā€,
ā€œunravelledā€: ā€œunraveledā€,
ā€œunravellingā€: ā€œunravelingā€,
ā€œunrecognisableā€: ā€œunrecognizableā€,
ā€œunrecognisedā€: ā€œunrecognizedā€,
ā€œunrivalledā€: ā€œunrivaledā€,
ā€œunsavouryā€: ā€œunsavoryā€,
ā€œuntrammelledā€: ā€œuntrammeledā€,
ā€œurbanisationā€: ā€œurbanizationā€,
ā€œurbaniseā€: ā€œurbanizeā€,
ā€œurbanisedā€: ā€œurbanizedā€,
ā€œurbanisesā€: ā€œurbanizesā€,
ā€œurbanisingā€: ā€œurbanizingā€,
ā€œutilisableā€: ā€œutilizableā€,
ā€œutilisationā€: ā€œutilizationā€,
ā€œutiliseā€: ā€œutilizeā€,
ā€œutilisedā€: ā€œutilizedā€,
ā€œutilisesā€: ā€œutilizesā€,
ā€œutilisingā€: ā€œutilizingā€,
ā€œvalourā€: ā€œvalorā€,
ā€œvandaliseā€: ā€œvandalizeā€,
ā€œvandalisedā€: ā€œvandalizedā€,
ā€œvandalisesā€: ā€œvandalizesā€,
ā€œvandalisingā€: ā€œvandalizingā€,
ā€œvaporisationā€: ā€œvaporizationā€,
ā€œvaporiseā€: ā€œvaporizeā€,
ā€œvaporisedā€: ā€œvaporizedā€,
ā€œvaporisesā€: ā€œvaporizesā€,
ā€œvaporisingā€: ā€œvaporizingā€,
ā€œvapourā€: ā€œvaporā€,
ā€œvapoursā€: ā€œvaporsā€,
ā€œverbaliseā€: ā€œverbalizeā€,
ā€œverbalisedā€: ā€œverbalizedā€,
ā€œverbalisesā€: ā€œverbalizesā€,
ā€œverbalisingā€: ā€œverbalizingā€,
ā€œvictimisationā€: ā€œvictimizationā€,
ā€œvictimiseā€: ā€œvictimizeā€,
ā€œvictimisedā€: ā€œvictimizedā€,
ā€œvictimisesā€: ā€œvictimizesā€,
ā€œvictimisingā€: ā€œvictimizingā€,
ā€œvideodiscā€: ā€œvideodiskā€,
ā€œvideodiscsā€: ā€œvideodisksā€,
ā€œvigourā€: ā€œvigorā€,
ā€œvisualisationā€: ā€œvisualizationā€,
ā€œvisualisationsā€: ā€œvisualizationsā€,
ā€œvisualiseā€: ā€œvisualizeā€,
ā€œvisualisedā€: ā€œvisualizedā€,
ā€œvisualisesā€: ā€œvisualizesā€,
ā€œvisualisingā€: ā€œvisualizingā€,
ā€œvocalisationā€: ā€œvocalizationā€,
ā€œvocalisationsā€: ā€œvocalizationsā€,
ā€œvocaliseā€: ā€œvocalizeā€,
ā€œvocalisedā€: ā€œvocalizedā€,
ā€œvocalisesā€: ā€œvocalizesā€,
ā€œvocalisingā€: ā€œvocalizingā€,
ā€œvulcanisedā€: ā€œvulcanizedā€,
ā€œvulgarisationā€: ā€œvulgarizationā€,
ā€œvulgariseā€: ā€œvulgarizeā€,
ā€œvulgarisedā€: ā€œvulgarizedā€,
ā€œvulgarisesā€: ā€œvulgarizesā€,
ā€œvulgarisingā€: ā€œvulgarizingā€,
ā€œwaggonā€: ā€œwagonā€,
ā€œwaggonsā€: ā€œwagonsā€,
ā€œwatercolourā€: ā€œwatercolorā€,
ā€œwatercoloursā€: ā€œwatercolorsā€,
ā€œweaselledā€: ā€œweaseledā€,
ā€œweasellingā€: ā€œweaselingā€,
ā€œwesternisationā€: ā€œwesternizationā€,
ā€œwesterniseā€: ā€œwesternizeā€,
ā€œwesternisedā€: ā€œwesternizedā€,
ā€œwesternisesā€: ā€œwesternizesā€,
ā€œwesternisingā€: ā€œwesternizingā€,
ā€œwomaniseā€: ā€œwomanizeā€,
ā€œwomanisedā€: ā€œwomanizedā€,
ā€œwomaniserā€: ā€œwomanizerā€,
ā€œwomanisersā€: ā€œwomanizersā€,
ā€œwomanisesā€: ā€œwomanizesā€,
ā€œwomanisingā€: ā€œwomanizingā€,
ā€œwoollenā€: ā€œwoolenā€,
ā€œwoollensā€: ā€œwoolensā€,
ā€œwoolliesā€: ā€œwooliesā€,
ā€œwoollyā€: ā€œwoolyā€,
ā€œworshippedā€: ā€œworshipedā€,
ā€œworshippingā€: ā€œworshipingā€,
ā€œworshipperā€: ā€œworshiperā€,
ā€œyodelledā€: ā€œyodeledā€,
ā€œyodellingā€: ā€œyodelingā€,
ā€œyoghourtā€: ā€œyogurtā€,
ā€œyoghourtsā€: ā€œyogurtsā€,
ā€œyoghurtā€: ā€œyogurtā€,
ā€œyoghurtsā€: ā€œyogurtsā€,
ā€œmhmā€: ā€œhmmā€,
ā€œmmmā€: ā€œhmmā€
}
./utils/preprocess_text.py

import re
import unicodedata
from zhconv import convert
from typing import List
from .english import EnglishTextNormalizer
from .basic import BasicTextNormalizer

def read_transcription(folder):
import csv
second_column_data = []
with open(folder, ā€˜r’, encoding=’latin-1’) as file:
reader = csv.reader(file)
for row in reader:
if len(row) > 1: # ē”®äæč‡³å°‘ęœ‰äø¤åˆ—ę•°ę®
second_column_data.append(row[1])
return second_column_data

def normalize_chinese(text):
punctuation = ā€˜!,.;:?ć€ļ¼ļ¼Œć€‚ļ¼›ļ¼šļ¼Ÿā€™
if isinstance(text, str):
text = re.sub(r’[{}]+’.format(punctuation), ā€˜ā€™, text).strip() # åˆ é™¤ę ‡ē‚¹ē¬¦å·
text = text.replace(ā€˜ ā€˜, ā€˜ā€™) # åˆ é™¤ē©ŗē™½
text = convert(text, ā€˜zh-cn’) # 繁体转简体
return text
elif isinstance(text, list):
result_text = []
for t in text:
t = re.sub(r’[{}]+’.format(punctuation), ā€˜ā€™, t).strip() # åˆ é™¤ę ‡ē‚¹ē¬¦å·
text = text.replace(ā€˜ ā€˜, ā€˜ā€™) # åˆ é™¤ē©ŗē™½
text = convert(text, ā€˜zh-cn’) # 繁体转简体
result_text.append(t)
return result_text
else:
raise Exception(fā€™äøę”ÆęŒčÆ„ē±»åž‹{type(text)}’)

def normalize_japanese_and_korean(text):
# č½¬ę¢äøŗå…Øč§’å­—ē¬¦
text = unicodedata.normalize(ā€˜NFKC’, text)

# åˆ é™¤ę ‡ē‚¹ē¬¦å·å’Œē©ŗē™½
text = re.sub(r'[^\w\s]', '', text)
text = re.sub(r'\s+', '', text)

return text

ę”ÆęŒč‹±čÆ­ļ¼Œäø“é—Øé’ˆåÆ¹č‹±ę–‡ę–‡ęœ¬čæ›č”Œę ‡å‡†åŒ–

def normalize_texts_english(texts: List[str]) -> List[str]:
# åˆ›å»ŗ EnglishTextNormalizer 实例
normalizer = EnglishTextNormalizer()

# åÆ¹ęÆäøŖę–‡ęœ¬åŗ”ē”Øę ‡å‡†åŒ–
normalized_texts = [normalizer(text) for text in texts]

return normalized_texts

ę”ÆęŒé™¤äŗ†äø­ę—„éŸ©ä¹‹å¤–ēš„å…¶ä»–čÆ­čØ€ļ¼Œä½†ę˜Æå¤„ē†č‹±čÆ­å»ŗč®®ä½æē”Ønormalize_texts

def normalize_texts_multi_language(texts: List[str]) -> List[str]:
# åˆ›å»ŗ EnglishTextNormalizer 实例
normalizer = BasicTextNormalizer()

# åÆ¹ęÆäøŖę–‡ęœ¬åŗ”ē”Øę ‡å‡†åŒ–
normalized_texts = [normalizer(text) for text in texts]

return normalized_texts

ę”ÆęŒäø­ę–‡

def normalize_texts_chinese(texts):
# ę ‡å‡†åŒ–ē¼©å†™
texts = [normalize_chinese(text) for text in texts] # åŽ»ęŽ‰ę ‡ē‚¹ē¬¦å·
return texts

ę”ÆęŒę—„čÆ­ć€éŸ©čÆ­

def normalize_texts_japanese_korean(texts):
texts = [normalize_japanese_and_korean(text) for text in texts]
return texts


č®­ē»ƒå®žę—¶čÆ­éŸ³čÆ†åˆ«ParaformeręØ”åž‹å®Œę•“ęŒ‡å—ļ¼šä»Žę•°ę®å‡†å¤‡åˆ°ęØ”åž‹čÆ„ä¼°
https://miku2024.top/posts/č®­ē»ƒå®žę—¶čÆ­éŸ³čÆ†åˆ«ParaformeręØ”åž‹/
ä½œč€…
KB
å‘åøƒäŗŽ
2024幓11月6ę—„
č®øåÆåč®®