č®ē»å®ę¶čÆé³čÆå«Paraformer樔åå®ę“ęåļ¼ä»ę°ę®åå¤å°ęØ”åčÆä¼°
š¤ åØäŗŗå·„ęŗč½åæ«éåå±ēä»å¤©ļ¼čÆé³čÆå«ęęÆå·²ē»ęäøŗä¼å¤åŗēØēę øåæćē¶čļ¼éēØčÆé³čÆå«ęØ”ååØē¹å®é¢åå¾å¾č”Øē°äøä½³ćę¬ęå°čƦē»ä»ē»å¦ä½åŗäŗFunASRę”ę¶č®ē»å®ę¶čÆé³čÆå«Paraformer樔åļ¼ä»ę°ę®åå¤ć樔åč®ē»å°ę§č½čÆä¼°ēå®ę“ęµēØļ¼åø®å©ä½ ę建éēØäŗē¹å®é¢åēé«ē²¾åŗ¦čÆé³čÆå«ē³»ē»ć
š¬ ē ē©¶čęÆäøåØęŗ
éēØęØ”åēå±éę§
åøé¢äøēčÆé³čÆå«ęØ”ååŗę¬äøé½ęÆéēØčÆå«ęØ”åļ¼åØē¹å®é¢åēåŗēØäøååØä»„äøé®é¢ļ¼
- åéēčæé«ļ¼åƹäŗäøäøęÆčÆåč”äøčÆę±čÆå«åē”®ēä½
- 读é³å·®å¼ļ¼äøåé¢åēę°å读é³č§čäøå
- äøäøęēč§£ļ¼ē¼ŗä¹ē¹å®åŗęÆēčÆčØęØ”åęÆę
é¢åē¹åēåæ č¦ę§
仄čŖē©ŗé¢åäøŗä¾ļ¼ååØčÆøå¤ęęļ¼
graph TD
A[éēØčÆé³čÆå«ęØ”å] --> B[čŖē©ŗé¢ååŗēØ]
B --> C[čÆå«åē”®ēä½]
B --> D[äøäøęÆčÆę ę³čÆå«]
B --> E[ę°å读é³äøå¹é
]
C --> F[éč¦é¢åē¹åč®ē»]
D --> F
E --> F
F --> G[ęåčÆå«ē²¾åŗ¦]
F --> H[éåŗäøäøåŗęÆ]
F --> I[éä½åéē]
å ·ä½é®é¢å ę¬ļ¼
- čŖē©ŗäøęåčÆļ¼å¦āē²éčæčæāćā建ē«čŖéāēļ¼
- ę°å读é³č§čäøę„åøøäøå
- ē¹å®éäæ”åč®®åęÆčÆ
č§£å³ę¹ę”ļ¼éčæåƹå¼ęŗParaformer樔åčæč”é¢åē¹åč®ē»ļ¼åÆä»„ę¾čęååØē¹å®åŗęÆäøēčÆå«åē”®ēć
š ę°ę®éåå¤
ę°ę®éę¦č§
ę¬ę¬”č®ē»ä½æēØēę°ę®éå ·ę仄äøē¹ē¹ļ¼
- ę°ę®č§ęØ”ļ¼34,090ę”é³é¢ę°ę®
- é³é¢ę ¼å¼ļ¼8kHzéę ·ēWAVęä»¶
- ę°ę®åå²ļ¼90%č®ē»éļ¼30,681ę”ļ¼+ 10%éŖčÆéļ¼3,409ę”ļ¼
- åŗēØé¢åļ¼čŖē©ŗéäæ”äøäøęÆčÆ
ę°ę®é蓨éč¦ę±ļ¼ē”®äæé³é¢ęø ę°åŗ¦čÆå„½ļ¼ę 注ęę¬åē”®ļ¼éæå åŖé³å¹²ę°å½±åč®ē»ęęć
ę°ę®ę ¼å¼č§č
FunASRę”ę¶č¦ę±ę°ę®éµå¾Ŗē¹å®ę ¼å¼ļ¼
ęę¬ę 注ęä»¶ (train_text.txt)
1 | |
é³é¢č·Æå¾ęä»¶ (train_wav.scp)
1 | |
ę ¼å¼č¦ę±ļ¼é³é¢IDåęä»¶č·Æå¾ä¹é“ēØē©ŗę ¼åéļ¼ē”®äæIDåØäø¤äøŖęä»¶äøå®å Øäøč“ć
ę°ę®é¢å¤ēčę¬
ē±äŗåå§ę°ę®ååøåØå¤äøŖåē®å½äøļ¼éč¦čæč”ē»äøę“ēć仄äøä»£ē å°åę£ēé³é¢ęä»¶åå¹¶å°ē»äøē®å½ļ¼
1 | |
šµ é³é¢éę ·ē转ę¢
åØęŗåØå¦ä¹ č®ē»äøļ¼ę°ę®äøč“ę§č³å ³éč¦ćå¦ęä½ ēé¢č®ē»ęØ”åęÆåŗäŗ16kHzé³é¢č®ē»ēļ¼é£ä¹ä½æēØēøåéę ·ēēę°ę®čæč”å¾®č°č½č·å¾ę“儽ēęęćę¬é”¹ē®åƹęÆäŗ8kHzå16kHzäø¤ē§éę ·ēēč®ē»ęęļ¼16kHz樔åēåéēęę¾ę“ä½ć
仄äøä»£ē 使ēØFFmpegčæč”é«ęēé³é¢éę ·ē转ę¢ļ¼éēØå¤čæēØå¹¶č”å¤ēå 快转ę¢éåŗ¦ļ¼
1 | |
ę§č½ä¼å建议ļ¼
- čæēØę°å»ŗč®®č®¾ē½®äøŗCPUę øåæę°ē70-80%
- ē”®äæęč¶³å¤ēē£ē空é“ååØč½¬ę¢åēé³é¢
- 16kHzéę ·ēåØčÆå«ē²¾åŗ¦åę件大å°ä¹é“ęä¾äŗčÆå„½ē平蔔
ę°ę®éę“ēå®ęåļ¼
- ęęč®ē»é³é¢ē»äøäæååØäøäøŖē®å½äø
- ęęę ē¾ę°ę®åå¹¶å°ē»äøēęę¬ęä»¶äø
- ęē §9:1ęÆä¾ååč®ē»éåéŖčÆé
- é³é¢éę ·ēē»äøč½¬ę¢äøŗ16kHzļ¼ęØčļ¼
š 樔åč®ē»
甬件č¦ę±
åØå¼å§č®ē»ä¹åļ¼čÆ·ē”®äæä½ ē甬件é 置滔足仄äøč¦ę±ļ¼
| é ē½®é”¹ē® | ęä½č¦ę± | ęØčé ē½® | ę¬ę¬”å®éŖ |
|---|---|---|---|
| ę¾å | 12GB | 24GB+ | 48GB (2ĆRTX4090) |
| å å | 32GB | 64GB+ | 64GB |
| ååØē©ŗé“ | 50GB | 100GB+ | 200GB |
| GPUę°é | 1å | 2å+ | 2å |
č®ē»ę¶é“é¢ä¼°ļ¼ę ¹ę®ę°ę®éå甬件é ē½®ļ¼å®ę“č®ē»čæēØåÆč½éč¦ę°å°ę¶å°ę°åå°ę¶äøēć
č®ē»čę¬é ē½®
FunASRęä¾äŗå®ę“ēč®ē»čę¬ęØ”ęæćē¼č¾ FunASR/examples/industrial_data_pretraining/paraformer_streaming/finetune.shļ¼
1 | |
š ļø č®ē»åę°čÆ¦č§£
äøč”ØčƦē»č§£éäŗåäøŖč®ē»åę°ēå«ä¹åęØč设置ļ¼
| åę° | é»č®¤å¼ | ęØččå“ | 诓ę |
|---|---|---|---|
| ę°ę®éåę° | |||
batch_size |
20000 | 10000-30000 | ęÆäøŖbatchētokenę°éļ¼åÆę ¹ę®ę¾åč°ę“ |
batch_type |
ātokenā | ātokenā/ālengthā | ę¹å¤ēē±»åļ¼tokenē±»åę“ēØ³å® |
num_workers |
4 | 2-8 | ę°ę®å 载线ēØę°ļ¼ę ¹ę®CPUę øåæę°č®¾å® |
| č®ē»åę° | |||
max_epoch |
50 | 20-200 | ę大č®ē»č½®ę°ļ¼é²ę¢čæęå |
log_interval |
1 | 1-10 | ę„åæč¾åŗé“éļ¼ę„ę°ļ¼ |
validate_interval |
2000 | 1000-5000 | éŖčÆéčÆä¼°é“éļ¼ę„ę°ļ¼ |
save_checkpoint_interval |
2000 | 1000-5000 | 樔åäæåé“éļ¼ę„ę°ļ¼ |
keep_nbest_models |
20 | 5-50 | äæēęä¼ęØ”åę°é |
| ä¼ååØåę° | |||
lr |
0.0002 | 0.0001-0.001 | åå§å¦ä¹ ēļ¼å¾®č°ę¶č®¾ē½®č¾å° |
åę°č°ä¼å»ŗč®®ļ¼
- batch_sizeļ¼ę ¹ę®ę¾å大å°č°ę“ļ¼ę¾åč¶å¤§åÆč®¾ē½®č¶å¤§ēbatch_size
- max_epochļ¼åꬔč®ē»å»ŗč®®č®¾ē½®äøŗ50-100ļ¼č§åÆę¶ęę åµåč°ę“
- å¦ä¹ ēļ¼å¾®č°é¢č®ē»ęØ”åę¶ļ¼å¦ä¹ ēäøå®čæå¤§ļ¼é²ę¢ē “ååęē¹å¾
š ę§č”č®ē»
åØé 置儽č®ē»čę¬åļ¼ęē §ä»„äøę„éŖ¤ę§č”č®ē»ļ¼
1. åå°čæč”č®ē»
1 | |
2. ēę§č®ē»čæåŗ¦
1 | |
3. åÆåØTensorBoardēę§
1 | |
č®ē»å®ęę åæļ¼
- č¾åŗē®å½äøēę
model.ptęä»¶ - éŖčÆéäøēę失å¼č¶äŗēسå®
- TensorBoardäøę¾ē¤ŗęø ę°ēę¶ęč¶åæ
š č®ē»čæåŗ¦ēę§
č®ē»čæēØäøéč¦å ³ę³Øä»„äøęę ļ¼
graph TD
A[č®ē»å¼å§] --> B[ēę§Lossę²ēŗæ]
B --> C[ę£ę„éŖčÆé蔨ē°]
C --> D{ęÆå¦čæęå?}
D -->|Yes| E[éä½å¦ä¹ ēęę©å]
D -->|No| F{ę¶ęäŗå?}
F -->|Yes| G[č®ē»å®ę]
F -->|No| H[ē»§ē»č®ē»]
H --> B
E --> B
å ³é®ęę 诓ęļ¼
- č®ē»ę失 (Train Loss)ļ¼åŗčÆ„ęē»äøé
- éŖčÆę失 (Valid Loss)ļ¼åŗčÆ„åę„äøéļ¼å¦ęäøåååÆč½čæęå
- åéē (CER)ļ¼čÆä¼°ęØ”ååØéŖčÆéäøē蔨ē°
šÆ 樔åęØē
樔åéØē½²åå¤
č®ē»å®ęåļ¼ä½ éč¦å°č®ē»å„½ē樔åęä»¶ęæę¢å°åå§ęØ”åē®å½äøļ¼
1 | |
éč¦ęéļ¼åØęæę¢ęØ”åęä»¶ä¹åļ¼čÆ·å”åæ å¤ä»½åå§ęØ”åļ¼ä»„é²ęå¤ę åµć
ęØē代ē å®ē°
仄äøęÆå®ę“ēęØē代ē ļ¼ęÆęå®ę¶ęµå¼čÆå«ļ¼
1 | |
š ęÆęå·„å ·ęä»¶
čÆä¼°čæēØäøéč¦ēęę¬é¢å¤ēå·„å ·ļ¼
1 | |
šÆ čÆä¼°ē»ę解读
čÆä¼°å®ęåļ¼ä½ å°å¾å°ä»„äøå ³é®äæ”ęÆļ¼
| ęę | ä¼ē§ | čÆå„½ | äøč¬ | éč¦ę¹čæ |
|---|---|---|---|---|
| äøęCER | < 5% | 5-10% | 10-20% | > 20% |
| č±ęWER | < 10% | 10-15% | 15-25% | > 25% |
čÆä¼°ęä½³å®č·µļ¼
- å¤ę ·åęµčÆé: ē”®äæęµčÆę°ę®č¦ēåē§åŗęÆå诓čÆäŗŗ
- 对ęÆåŗēŗæ: äøåå§ęŖč®ē»ęØ”åčæč”对ęÆļ¼éŖčÆč®ē»ęę
- é误åę: ę·±å „åęé误类åļ¼ę导åē»ä¼åę¹å
- ęē»ēę§: å®ęåØę°ę°ę®äøčÆä¼°ęØ”åę§č½
import re
šÆ ę»ē»äøå±ę
š 锹ē®ę»ē»
éčæę¬ęēå®ę“ęµēØļ¼ę们ęåå®ē°äŗåŗäŗFunASRę”ę¶ēParaformerčÆé³čÆå«ęØ”åé¢åē¹åč®ē»ć锹ē®ēäø»č¦ęęå ę¬ļ¼
ęęÆå®ē°ę¹é¢ļ¼
- ā å®ęäŗ34,090ę”čŖē©ŗé¢åčÆé³ę°ę®ēé¢å¤ēåę ¼å¼č½¬ę¢
- ā ęåč®ē»äŗéēØäŗčŖē©ŗéäæ”åŗęÆēäøēØčÆé³čÆå«ęØ”å
- ā å®ē°äŗé«ęēå®ę¶ęµå¼čÆé³čÆå«ęØēē³»ē»
- ā 建ē«äŗå®ę“ē樔åčÆä¼°åę§č½ēę§ä½ē³»
ę§č½ęåę¹é¢ļ¼
- š ēøęÆéēØęØ”åļ¼åØčŖē©ŗé¢åēåéēę¾čéä½
- š äøäøęÆčÆčÆå«åē”®ēå¤§å¹ ęå
- š ę°å读é³åäøäøåč®®čÆå«ę“å åē”®
š ęŖę„ę¹čæę¹å
ę°ę®å¢å¼ŗ
- ę¶éę“å¤ę ·åēčŖē©ŗéäæ”ę°ę®
- å¼å „ę°ę®å¢å¼ŗęęÆļ¼éåŗ¦ę°åØćåŖé³ę·»å ēļ¼
- 平蔔äøååŗęÆå诓čÆäŗŗēę°ę®ååø
樔åä¼å
- å°čÆę“大č§ęØ”ēé¢č®ē»ęØ”å
- å®éŖäøåēå¦ä¹ ēč°åŗ¦ēē„
- ę¢ē“¢ē„čÆčøé¦ē樔åå缩ęęÆ
ē³»ē»éę
- å¼åå®ę¶čÆé³čÆå«APIęå”
- éęčÆé³ē«Æē¹ę£ęµ(VAD)åč½
- ę建å®ę“ēčÆé³å¤ēpipeline
å¤é¢åę©å±
- ę©å±å°å»ēćéčēå ¶ä»äøäøé¢å
- ęÆęå¤čÆčØę··åčÆå«
- å¼åé¢åčŖéåŗēåØēŗæå¦ä¹ ęŗå¶
š” ē»éŖę»ē»
ęåč¦ē“ ļ¼
- š é«č“Øéę°ę®ļ¼ē”®äæę 注åē”®ę§åé³é¢ęø ę°åŗ¦
- āļø åēåę°č®¾ē½®ļ¼ę ¹ę®ē”¬ä»¶čµęŗåę°ę®ē¹ē¹č°ä¼
- š ęē»ēę§ļ¼å®ę¶č·čøŖč®ē»čæåŗ¦å樔åę§č½
- šÆ å åčÆä¼°ļ¼å¤ē»“åŗ¦éŖčÆęØ”åęę
åøøč§ęęļ¼
- š§ ę°ę®äøå¹³č””ļ¼ęäŗäøäøęÆčÆę ·ę¬čæå°
- š¾ čµęŗéå¶ļ¼ę¾ååč®ē»ę¶é“ēęč””
- šļø č¶ åč°ä¼ļ¼éč¦å¤ę¬”å®éŖę¾å°ęä½³é ē½®
- š čæęåé£é©ļ¼å°ę°ę®é容ęåŗē°čæęå
å ³é®å»ŗč®®ļ¼åØčæč”é¢åē¹åč®ē»ę¶ļ¼å”åæ äæęčåæåē³»ē»ę§ēå®éŖę¹ę³ćęÆäøę¬”č°ę“é½č¦ęęē”®ēå设åéŖčÆęŗå¶ļ¼čæę ·ęč½ęē»č·å¾ę»”ęēē»ęć
š åččµęŗ
å®ę¹ę攣
ēøå ³å·„å ·
å¦ä¹ čµęŗ
š ęåä½ å®ęäŗčæäøŖęęę§ē锹ē®ļ¼
čÆé³čÆå«é¢åē¹åč®ē»ęÆäøäøŖéč¦čåæåęå·§ēčæēØļ¼ä½éčæē³»ē»ę§ēę¹ę³åäøęēå®č·µļ¼ä½ äøå®č½å¤ę建åŗę»”č¶³ē¹å®éę±ēé«č“ØéčÆé³čÆå«ē³»ē»ćåøęčæēÆęē« åÆ¹ä½ ēå¦ä¹ åå·„ä½ęęåø®å©ļ¼
å¦ęä½ åØå®č·µčæēØäøéå°ä»»ä½é®é¢ļ¼ę¬¢čæåØčÆč®ŗåŗäŗ¤ęµč®Øč®ŗć让ę们äøčµ·ęØåØčÆé³čÆå«ęęÆēåå±ļ¼
./utils/cer.py
Copyright 2021 The HuggingFace Evaluate Authors.
Licensed under the Apache License, Version 2.0 (the āLicenseā);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an āAS ISā BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
āāā Character Error Ratio (CER) metric. āāā
from typing import List
import datasets
import jiwer
import jiwer.transforms as tr
from datasets.config import PY_VERSION
from packaging import version
import evaluate
if PY_VERSION < version.parse(ā3.8ā):
import importlib_metadata
else:
import importlib.metadata as importlib_metadata
SENTENCE_DELIMITER = āā
if version.parse(importlib_metadata.version(ājiwerā)) < version.parse(ā2.3.0ā):
class SentencesToListOfCharacters(tr.AbstractTransform):
def __init__(self, sentence_delimiter: str = " "):
self.sentence_delimiter = sentence_delimiter
def process_string(self, s: str):
return list(s)
def process_list(self, inp: List[str]):
chars = []
for sent_idx, sentence in enumerate(inp):
chars.extend(self.process_string(sentence))
if self.sentence_delimiter is not None and self.sentence_delimiter != "" and sent_idx < len(inp) - 1:
chars.append(self.sentence_delimiter)
return chars
cer_transform = tr.Compose(
[tr.RemoveMultipleSpaces(), tr.Strip(), SentencesToListOfCharacters(SENTENCE_DELIMITER)]
)
else:
cer_transform = tr.Compose(
[
tr.RemoveMultipleSpaces(),
tr.Strip(),
tr.ReduceToSingleSentence(SENTENCE_DELIMITER),
tr.ReduceToListOfListOfChars(),
]
)
_CITATION = āāā
@inproceedings{inproceedings,
author = {Morris, Andrew and Maier, Viktoria and Green, Phil},
year = {2004},
month = {01},
pages = {},
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.}
}
āāā
_DESCRIPTION = āāā
Character error rate (CER) is a common metric of the performance of an automatic speech recognition system.
CER is similar to Word Error Rate (WER), but operates on character instead of word. Please refer to docs of WER for further information.
Character error rate can be computed as:
CER = (S + D + I) / N = (S + D + I) / (S + D + C)
where
S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct characters,
N is the number of characters in the reference (N=S+D+C).
CERās output is not always a number between 0 and 1, in particular when there is a high number of insertions. This value is often associated to the percentage of characters that were incorrectly predicted. The lower the value, the better the
performance of the ASR system with a CER of 0 being a perfect score.
āāā
_KWARGS_DESCRIPTION = āāā
Computes CER score of transcribed segments against references.
Args:
references: list of references for each speech input.
predictions: list of transcribtions to score.
concatenate_texts: Whether or not to concatenate sentences before evaluation, set to True for more accurate result.
Returns:
(float): the character error rate
Examples:
>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> cer = evaluate.load("cer")
>>> cer_score = cer.compute(predictions=predictions, references=references)
>>> print(cer_score)
0.34146341463414637
āāā
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class CER(evaluate.Metric):
def _info(self):
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
āpredictionsā: datasets.Value(āstringā, id=āsequenceā),
āreferencesā: datasets.Value(āstringā, id=āsequenceā),
}
),
codebase_urls=[āhttps://github.com/jitsi/jiwer/"],
reference_urls=[
āhttps://en.wikipedia.org/wiki/Word_error_rate",
āhttps://sites.google.com/site/textdigitisation/qualitymeasures/computingerrorrates",
],
)
def _compute(self, predictions, references, concatenate_texts=False):
if concatenate_texts:
return jiwer.compute_measures(
references,
predictions,
truth_transform=cer_transform,
hypothesis_transform=cer_transform,
)["wer"]
incorrect = 0
total = 0
for prediction, reference in zip(predictions, references):
measures = jiwer.compute_measures(
reference,
prediction,
truth_transform=cer_transform,
hypothesis_transform=cer_transform,
)
incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"]
total += measures["substitutions"] + measures["deletions"] + measures["hits"]
return incorrect / total
./utils/wer.py
Copyright 2021 The HuggingFace Evaluate Authors.
Licensed under the Apache License, Version 2.0 (the āLicenseā);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an āAS ISā BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
āāā Word Error Ratio (WER) metric. āāā
import datasets
from jiwer import compute_measures
import evaluate
_CITATION = āāā
@inproceedings{inproceedings,
author = {Morris, Andrew and Maier, Viktoria and Green, Phil},
year = {2004},
month = {01},
pages = {},
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.}
}
āāā
_DESCRIPTION = āāā
Word error rate (WER) is a common metric of the performance of an automatic speech recognition system.
The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort.
This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate.
Word error rate can then be computed as:
WER = (S + D + I) / N = (S + D + I) / (S + D + C)
where
S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct words,
N is the number of words in the reference (N=S+D+C).
This value indicates the average number of errors per reference word. The lower the value, the better the
performance of the ASR system with a WER of 0 being a perfect score.
āāā
_KWARGS_DESCRIPTION = āāā
Compute WER score of transcribed segments against references.
Args:
references: List of references for each speech input.
predictions: List of transcriptions to score.
concatenate_texts (bool, default=False): Whether to concatenate all input texts or compute WER iteratively.
Returns:
(float): the word error rate
Examples:
>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> wer = evaluate.load("wer")
>>> wer_score = wer.compute(predictions=predictions, references=references)
>>> print(wer_score)
0.5
āāā
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class WER(evaluate.Metric):
def _info(self):
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
āpredictionsā: datasets.Value(āstringā, id=āsequenceā),
āreferencesā: datasets.Value(āstringā, id=āsequenceā),
}
),
codebase_urls=[āhttps://github.com/jitsi/jiwer/"],
reference_urls=[
āhttps://en.wikipedia.org/wiki/Word_error_rate",
],
)
def _compute(self, predictions=None, references=None, concatenate_texts=False):
if concatenate_texts:
return compute_measures(references, predictions)["wer"]
else:
incorrect = 0
total = 0
for prediction, reference in zip(predictions, references):
measures = compute_measures(reference, prediction)
incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"]
total += measures["substitutions"] + measures["deletions"] + measures["hits"]
return incorrect / total
./utils/english.py
import json
import os
import re
from fractions import Fraction
from typing import Iterator, List, Match, Optional, Union
from more_itertools import windowed
from .basic import remove_symbols_and_diacritics
class EnglishNumberNormalizer:
āāā
Convert any spelled-out numbers into arabic numbers, while handling:
- remove any commas
- keep the suffixes such as: `1960s`, `274th`, `32nd`, etc.
- spell out currency symbols after the number. e.g. `$20 million` -> `20000000 dollars`
- spell out `one` and `ones`
- interpret successive single-digit numbers as nominal: `one oh one` -> `101`
"""
def __init__(self):
super().__init__()
self.zeros = {"o", "oh", "zero"}
self.ones = {
name: i
for i, name in enumerate(
[
"one",
"two",
"three",
"four",
"five",
"six",
"seven",
"eight",
"nine",
"ten",
"eleven",
"twelve",
"thirteen",
"fourteen",
"fifteen",
"sixteen",
"seventeen",
"eighteen",
"nineteen",
],
start=1,
)
}
self.ones_plural = {
"sixes" if name == "six" else name + "s": (value, "s")
for name, value in self.ones.items()
}
self.ones_ordinal = {
"zeroth": (0, "th"),
"first": (1, "st"),
"second": (2, "nd"),
"third": (3, "rd"),
"fifth": (5, "th"),
"twelfth": (12, "th"),
**{
name + ("h" if name.endswith("t") else "th"): (value, "th")
for name, value in self.ones.items()
if value > 3 and value != 5 and value != 12
},
}
self.ones_suffixed = {**self.ones_plural, **self.ones_ordinal}
self.tens = {
"twenty": 20,
"thirty": 30,
"forty": 40,
"fifty": 50,
"sixty": 60,
"seventy": 70,
"eighty": 80,
"ninety": 90,
}
self.tens_plural = {
name.replace("y", "ies"): (value, "s") for name, value in self.tens.items()
}
self.tens_ordinal = {
name.replace("y", "ieth"): (value, "th")
for name, value in self.tens.items()
}
self.tens_suffixed = {**self.tens_plural, **self.tens_ordinal}
self.multipliers = {
"hundred": 100,
"thousand": 1_000,
"million": 1_000_000,
"billion": 1_000_000_000,
"trillion": 1_000_000_000_000,
"quadrillion": 1_000_000_000_000_000,
"quintillion": 1_000_000_000_000_000_000,
"sextillion": 1_000_000_000_000_000_000_000,
"septillion": 1_000_000_000_000_000_000_000_000,
"octillion": 1_000_000_000_000_000_000_000_000_000,
"nonillion": 1_000_000_000_000_000_000_000_000_000_000,
"decillion": 1_000_000_000_000_000_000_000_000_000_000_000,
}
self.multipliers_plural = {
name + "s": (value, "s") for name, value in self.multipliers.items()
}
self.multipliers_ordinal = {
name + "th": (value, "th") for name, value in self.multipliers.items()
}
self.multipliers_suffixed = {
**self.multipliers_plural,
**self.multipliers_ordinal,
}
self.decimals = {*self.ones, *self.tens, *self.zeros}
self.preceding_prefixers = {
"minus": "-",
"negative": "-",
"plus": "+",
"positive": "+",
}
self.following_prefixers = {
"pound": "Ā£",
"pounds": "Ā£",
"euro": "ā¬",
"euros": "ā¬",
"dollar": "$",
"dollars": "$",
"cent": "Ā¢",
"cents": "Ā¢",
}
self.prefixes = set(
list(self.preceding_prefixers.values())
+ list(self.following_prefixers.values())
)
self.suffixers = {
"per": {"cent": "%"},
"percent": "%",
}
self.specials = {"and", "double", "triple", "point"}
self.words = set(
[
key
for mapping in [
self.zeros,
self.ones,
self.ones_suffixed,
self.tens,
self.tens_suffixed,
self.multipliers,
self.multipliers_suffixed,
self.preceding_prefixers,
self.following_prefixers,
self.suffixers,
self.specials,
]
for key in mapping
]
)
self.literal_words = {"one", "ones"}
def process_words(self, words: List[str]) -> Iterator[str]:
prefix: Optional[str] = None
value: Optional[Union[str, int]] = None
skip = False
def to_fraction(s: str):
try:
return Fraction(s)
except ValueError:
return None
def output(result: Union[str, int]):
nonlocal prefix, value
result = str(result)
if prefix is not None:
result = prefix + result
value = None
prefix = None
return result
if len(words) == 0:
return
for prev, current, next in windowed([None] + words + [None], 3):
if skip:
skip = False
continue
next_is_numeric = next is not None and re.match(r"^\d+(\.\d+)?$", next)
has_prefix = current[0] in self.prefixes
current_without_prefix = current[1:] if has_prefix else current
if re.match(r"^\d+(\.\d+)?$", current_without_prefix):
# arabic numbers (potentially with signs and fractions)
f = to_fraction(current_without_prefix)
assert f is not None
if value is not None:
if isinstance(value, str) and value.endswith("."):
# concatenate decimals / ip address components
value = str(value) + str(current)
continue
else:
yield output(value)
prefix = current[0] if has_prefix else prefix
if f.denominator == 1:
value = f.numerator # store integers as int
else:
value = current_without_prefix
elif current not in self.words:
# non-numeric words
if value is not None:
yield output(value)
yield output(current)
elif current in self.zeros:
value = str(value or "") + "0"
elif current in self.ones:
ones = self.ones[current]
if value is None:
value = ones
elif isinstance(value, str) or prev in self.ones:
if (
prev in self.tens and ones < 10
): # replace the last zero with the digit
assert value[-1] == "0"
value = value[:-1] + str(ones)
else:
value = str(value) + str(ones)
elif ones < 10:
if value % 10 == 0:
value += ones
else:
value = str(value) + str(ones)
else: # eleven to nineteen
if value % 100 == 0:
value += ones
else:
value = str(value) + str(ones)
elif current in self.ones_suffixed:
# ordinal or cardinal; yield the number right away
ones, suffix = self.ones_suffixed[current]
if value is None:
yield output(str(ones) + suffix)
elif isinstance(value, str) or prev in self.ones:
if prev in self.tens and ones < 10:
assert value[-1] == "0"
yield output(value[:-1] + str(ones) + suffix)
else:
yield output(str(value) + str(ones) + suffix)
elif ones < 10:
if value % 10 == 0:
yield output(str(value + ones) + suffix)
else:
yield output(str(value) + str(ones) + suffix)
else: # eleven to nineteen
if value % 100 == 0:
yield output(str(value + ones) + suffix)
else:
yield output(str(value) + str(ones) + suffix)
value = None
elif current in self.tens:
tens = self.tens[current]
if value is None:
value = tens
elif isinstance(value, str):
value = str(value) + str(tens)
else:
if value % 100 == 0:
value += tens
else:
value = str(value) + str(tens)
elif current in self.tens_suffixed:
# ordinal or cardinal; yield the number right away
tens, suffix = self.tens_suffixed[current]
if value is None:
yield output(str(tens) + suffix)
elif isinstance(value, str):
yield output(str(value) + str(tens) + suffix)
else:
if value % 100 == 0:
yield output(str(value + tens) + suffix)
else:
yield output(str(value) + str(tens) + suffix)
elif current in self.multipliers:
multiplier = self.multipliers[current]
if value is None:
value = multiplier
elif isinstance(value, str) or value == 0:
f = to_fraction(value)
p = f * multiplier if f is not None else None
if f is not None and p.denominator == 1:
value = p.numerator
else:
yield output(value)
value = multiplier
else:
before = value // 1000 * 1000
residual = value % 1000
value = before + residual * multiplier
elif current in self.multipliers_suffixed:
multiplier, suffix = self.multipliers_suffixed[current]
if value is None:
yield output(str(multiplier) + suffix)
elif isinstance(value, str):
f = to_fraction(value)
p = f * multiplier if f is not None else None
if f is not None and p.denominator == 1:
yield output(str(p.numerator) + suffix)
else:
yield output(value)
yield output(str(multiplier) + suffix)
else: # int
before = value // 1000 * 1000
residual = value % 1000
value = before + residual * multiplier
yield output(str(value) + suffix)
value = None
elif current in self.preceding_prefixers:
# apply prefix (positive, minus, etc.) if it precedes a number
if value is not None:
yield output(value)
if next in self.words or next_is_numeric:
prefix = self.preceding_prefixers[current]
else:
yield output(current)
elif current in self.following_prefixers:
# apply prefix (dollars, cents, etc.) only after a number
if value is not None:
prefix = self.following_prefixers[current]
yield output(value)
else:
yield output(current)
elif current in self.suffixers:
# apply suffix symbols (percent -> '%')
if value is not None:
suffix = self.suffixers[current]
if isinstance(suffix, dict):
if next in suffix:
yield output(str(value) + suffix[next])
skip = True
else:
yield output(value)
yield output(current)
else:
yield output(str(value) + suffix)
else:
yield output(current)
elif current in self.specials:
if next not in self.words and not next_is_numeric:
# apply special handling only if the next word can be numeric
if value is not None:
yield output(value)
yield output(current)
elif current == "and":
# ignore "and" after hundreds, thousands, etc.
if prev not in self.multipliers:
if value is not None:
yield output(value)
yield output(current)
elif current == "double" or current == "triple":
if next in self.ones or next in self.zeros:
repeats = 2 if current == "double" else 3
ones = self.ones.get(next, 0)
value = str(value or "") + str(ones) * repeats
skip = True
else:
if value is not None:
yield output(value)
yield output(current)
elif current == "point":
if next in self.decimals or next_is_numeric:
value = str(value or "") + "."
else:
# should all have been covered at this point
raise ValueError(f"Unexpected token: {current}")
else:
# all should have been covered at this point
raise ValueError(f"Unexpected token: {current}")
if value is not None:
yield output(value)
def preprocess(self, s: str):
# replace "<number> and a half" with "<number> point five"
results = []
segments = re.split(r"\band\s+a\s+half\b", s)
for i, segment in enumerate(segments):
if len(segment.strip()) == 0:
continue
if i == len(segments) - 1:
results.append(segment)
else:
results.append(segment)
last_word = segment.rsplit(maxsplit=2)[-1]
if last_word in self.decimals or last_word in self.multipliers:
results.append("point five")
else:
results.append("and a half")
s = " ".join(results)
# put a space at number/letter boundary
s = re.sub(r"([a-z])([0-9])", r"\1 \2", s)
s = re.sub(r"([0-9])([a-z])", r"\1 \2", s)
# but remove spaces which could be a suffix
s = re.sub(r"([0-9])\s+(st|nd|rd|th|s)\b", r"\1\2", s)
return s
def postprocess(self, s: str):
def combine_cents(m: Match):
try:
currency = m.group(1)
integer = m.group(2)
cents = int(m.group(3))
return f"{currency}{integer}.{cents:02d}"
except ValueError:
return m.string
def extract_cents(m: Match):
try:
return f"Ā¢{int(m.group(1))}"
except ValueError:
return m.string
# apply currency postprocessing; "$2 and ¢7" -> "$2.07"
s = re.sub(r"([ā¬Ā£$])([0-9]+) (?:and )?Ā¢([0-9]{1,2})\b", combine_cents, s)
s = re.sub(r"[ā¬Ā£$]0.([0-9]{1,2})\b", extract_cents, s)
# write "one(s)" instead of "1(s)", just for the readability
s = re.sub(r"\b1(s?)\b", r"one\1", s)
return s
def __call__(self, s: str):
s = self.preprocess(s)
s = " ".join(word for word in self.process_words(s.split()) if word is not None)
s = self.postprocess(s)
return s
class EnglishSpellingNormalizer:
āāā
Applies British-American spelling mappings as listed in [1].
[1] https://www.tysto.com/uk-us-spelling-list.html
"""
def __init__(self):
mapping_path = os.path.join(os.path.dirname(__file__), "english.json")
self.mapping = json.load(open(mapping_path))
def __call__(self, s: str):
return " ".join(self.mapping.get(word, word) for word in s.split())
class EnglishTextNormalizer:
def init(self):
self.ignore_patterns = rā\b(hmm|mm|mhm|mmm|uh|um)\bā
self.replacers = {
# common contractions
rā\bwonāt\bā: āwill notā,
rā\bcanāt\bā: ācan notā,
rā\bletās\bā: ālet usā,
rā\baināt\bā: āaintā,
rā\byāall\bā: āyou allā,
rā\bwanna\bā: āwant toā,
rā\bgotta\bā: āgot toā,
rā\bgonna\bā: āgoing toā,
rā\biāma\bā: āi am going toā,
rā\bimma\bā: āi am going toā,
rā\bwoulda\bā: āwould haveā,
rā\bcoulda\bā: ācould haveā,
rā\bshoulda\bā: āshould haveā,
rā\bmaāam\bā: āmadamā,
# contractions in titles/prefixes
rā\bmr\bā: āmister ā,
rā\bmrs\bā: āmissus ā,
rā\bst\bā: āsaint ā,
rā\bdr\bā: ādoctor ā,
rā\bprof\bā: āprofessor ā,
rā\bcapt\bā: ācaptain ā,
rā\bgov\bā: āgovernor ā,
rā\bald\bā: āalderman ā,
rā\bgen\bā: āgeneral ā,
rā\bsen\bā: āsenator ā,
rā\brep\bā: ārepresentative ā,
rā\bpres\bā: āpresident ā,
rā\brev\bā: āreverend ā,
rā\bhon\bā: āhonorable ā,
rā\basst\bā: āassistant ā,
rā\bassoc\bā: āassociate ā,
rā\blt\bā: ālieutenant ā,
rā\bcol\bā: ācolonel ā,
rā\bjr\bā: ājunior ā,
rā\bsr\bā: āsenior ā,
rā\besq\bā: āesquire ā,
# prefect tenses, ideally it should be any past participles, but itās harder..
rāād been\bā: ā had beenā,
rāās been\bā: ā has beenā,
rāād gone\bā: ā had goneā,
rāās gone\bā: ā has goneā,
rāād done\bā: ā had doneā, # āās doneā is ambiguous
rāās got\bā: ā has gotā,
# general contractions
rānāt\bā: ā notā,
rāāre\bā: ā areā,
rāās\bā: ā isā,
rāād\bā: ā wouldā,
rāāll\bā: ā willā,
rāāt\bā: ā notā,
rāāve\bā: ā haveā,
rāām\bā: ā amā,
}
self.standardize_numbers = EnglishNumberNormalizer()
self.standardize_spellings = EnglishSpellingNormalizer()
def __call__(self, s: str):
s = s.lower()
s = re.sub(r"[<\[][^>\]]*[>\]]", "", s) # remove words between brackets
s = re.sub(r"\(([^)]+?)\)", "", s) # remove words between parenthesis
s = re.sub(self.ignore_patterns, "", s)
s = re.sub(r"\s+'", "'", s) # when there's a space before an apostrophe
for pattern, replacement in self.replacers.items():
s = re.sub(pattern, replacement, s)
s = re.sub(r"(\d),(\d)", r"\1\2", s) # remove commas between digits
s = re.sub(r"\.([^0-9]|$)", r" \1", s) # remove periods not followed by numbers
s = remove_symbols_and_diacritics(s, keep=".%$Ā¢ā¬Ā£") # keep numeric symbols
s = self.standardize_numbers(s)
s = self.standardize_spellings(s)
# now remove prefix/suffix symbols that are not preceded/followed by numbers
s = re.sub(r"[.$Ā¢ā¬Ā£]([^0-9])", r" \1", s)
s = re.sub(r"([^0-9])%", r"\1 ", s)
s = re.sub(r"\s+", " ", s) # replace any successive whitespaces with a space
return s
./utils/english.json
{
āaccessoriseā: āaccessorizeā,
āaccessorisedā: āaccessorizedā,
āaccessorisesā: āaccessorizesā,
āaccessorisingā: āaccessorizingā,
āacclimatisationā: āacclimatizationā,
āacclimatiseā: āacclimatizeā,
āacclimatisedā: āacclimatizedā,
āacclimatisesā: āacclimatizesā,
āacclimatisingā: āacclimatizingā,
āaccoutrementsā: āaccoutermentsā,
āaeonā: āeonā,
āaeonsā: āeonsā,
āaerogrammeā: āaerogramā,
āaerogrammesā: āaerogramsā,
āaeroplaneā: āairplaneā,
āaeroplanesā: āairplanesā,
āaestheteā: āestheteā,
āaesthetesā: āesthetesā,
āaestheticā: āestheticā,
āaestheticallyā: āestheticallyā,
āaestheticsā: āestheticsā,
āaetiologyā: āetiologyā,
āageingā: āagingā,
āaggrandisementā: āaggrandizementā,
āagoniseā: āagonizeā,
āagonisedā: āagonizedā,
āagonisesā: āagonizesā,
āagonisingā: āagonizingā,
āagonisinglyā: āagonizinglyā,
āalmanackā: āalmanacā,
āalmanacksā: āalmanacsā,
āaluminiumā: āaluminumā,
āamortisableā: āamortizableā,
āamortisationā: āamortizationā,
āamortisationsā: āamortizationsā,
āamortiseā: āamortizeā,
āamortisedā: āamortizedā,
āamortisesā: āamortizesā,
āamortisingā: āamortizingā,
āamphitheatreā: āamphitheaterā,
āamphitheatresā: āamphitheatersā,
āanaemiaā: āanemiaā,
āanaemicā: āanemicā,
āanaesthesiaā: āanesthesiaā,
āanaestheticā: āanestheticā,
āanaestheticsā: āanestheticsā,
āanaesthetiseā: āanesthetizeā,
āanaesthetisedā: āanesthetizedā,
āanaesthetisesā: āanesthetizesā,
āanaesthetisingā: āanesthetizingā,
āanaesthetistā: āanesthetistā,
āanaesthetistsā: āanesthetistsā,
āanaesthetizeā: āanesthetizeā,
āanaesthetizedā: āanesthetizedā,
āanaesthetizesā: āanesthetizesā,
āanaesthetizingā: āanesthetizingā,
āanalogueā: āanalogā,
āanaloguesā: āanalogsā,
āanalyseā: āanalyzeā,
āanalysedā: āanalyzedā,
āanalysesā: āanalyzesā,
āanalysingā: āanalyzingā,
āangliciseā: āanglicizeā,
āanglicisedā: āanglicizedā,
āanglicisesā: āanglicizesā,
āanglicisingā: āanglicizingā,
āannualisedā: āannualizedā,
āantagoniseā: āantagonizeā,
āantagonisedā: āantagonizedā,
āantagonisesā: āantagonizesā,
āantagonisingā: āantagonizingā,
āapologiseā: āapologizeā,
āapologisedā: āapologizedā,
āapologisesā: āapologizesā,
āapologisingā: āapologizingā,
āappalā: āappallā,
āappalsā: āappallsā,
āappetiserā: āappetizerā,
āappetisersā: āappetizersā,
āappetisingā: āappetizingā,
āappetisinglyā: āappetizinglyā,
āarbourā: āarborā,
āarboursā: āarborsā,
āarcheologicalā: āarchaeologicalā,
āarchaeologicallyā: āarcheologicallyā,
āarchaeologistā: āarcheologistā,
āarchaeologistsā: āarcheologistsā,
āarchaeologyā: āarcheologyā,
āardourā: āardorā,
āarmourā: āarmorā,
āarmouredā: āarmoredā,
āarmourerā: āarmorerā,
āarmourersā: āarmorersā,
āarmouriesā: āarmoriesā,
āarmouryā: āarmoryā,
āartefactā: āartifactā,
āartefactsā: āartifactsā,
āauthoriseā: āauthorizeā,
āauthorisedā: āauthorizedā,
āauthorisesā: āauthorizesā,
āauthorisingā: āauthorizingā,
āaxeā: āaxā,
ābackpedalledā: ābackpedaledā,
ābackpedallingā: ābackpedalingā,
ābannisterā: ābanisterā,
ābannistersā: ābanistersā,
ābaptiseā: ābaptizeā,
ābaptisedā: ābaptizedā,
ābaptisesā: ābaptizesā,
ābaptisingā: ābaptizingā,
ābastardiseā: ābastardizeā,
ābastardisedā: ābastardizedā,
ābastardisesā: ābastardizesā,
ābastardisingā: ābastardizingā,
ābattleaxā: ābattleaxeā,
ābaulkā: ābalkā,
ābaulkedā: ābalkedā,
ābaulkingā: ābalkingā,
ābaulksā: ābalksā,
ābedevilledā: ābedeviledā,
ābedevillingā: ābedevilingā,
ābehaviourā: ābehaviorā,
ābehaviouralā: ābehavioralā,
ābehaviourismā: ābehaviorismā,
ābehaviouristā: ābehavioristā,
ābehaviouristsā: ābehavioristsā,
ābehavioursā: ābehaviorsā,
ābehoveā: ābehooveā,
ābehovedā: ābehoovedā,
ābehovesā: ābehoovesā,
ābejewelledā: ābejeweledā,
ābelabourā: ābelaborā,
ābelabouredā: ābelaboredā,
ābelabouringā: ābelaboringā,
ābelaboursā: ābelaborsā,
ābevelledā: ābeveledā,
ābevviesā: ābeviesā,
ābevvyā: ābevyā,
ābiassedā: ābiasedā,
ābiassingā: ābiasingā,
ābingeingā: ābingingā,
ābougainvillaeaā: ābougainvilleaā,
ābougainvillaeasā: ābougainvilleasā,
ābowdleriseā: ābowdlerizeā,
ābowdlerisedā: ābowdlerizedā,
ābowdlerisesā: ābowdlerizesā,
ābowdlerisingā: ābowdlerizingā,
ābreathalyseā: ābreathalyzeā,
ābreathalysedā: ābreathalyzedā,
ābreathalyserā: ābreathalyzerā,
ābreathalysersā: ābreathalyzersā,
ābreathalysesā: ābreathalyzesā,
ābreathalysingā: ābreathalyzingā,
ābrutaliseā: ābrutalizeā,
ābrutalisedā: ābrutalizedā,
ābrutalisesā: ābrutalizesā,
ābrutalisingā: ābrutalizingā,
ābussesā: ābusesā,
ābussingā: ābusingā,
ācaesareanā: ācesareanā,
ācaesareansā: ācesareansā,
ācalibreā: ācaliberā,
ācalibresā: ācalibersā,
ācalliperā: ācaliperā,
ācallipersā: ācalipersā,
ācallisthenicsā: ācalisthenicsā,
ācanaliseā: ācanalizeā,
ācanalisedā: ācanalizedā,
ācanalisesā: ācanalizesā,
ācanalisingā: ācanalizingā,
ācancelationā: ācancellationā,
ācancelationsā: ācancellationsā,
ācancelledā: ācanceledā,
ācancellingā: ācancelingā,
ācandourā: ācandorā,
ācannibaliseā: ācannibalizeā,
ācannibalisedā: ācannibalizedā,
ācannibalisesā: ācannibalizesā,
ācannibalisingā: ācannibalizingā,
ācanoniseā: ācanonizeā,
ācanonisedā: ācanonizedā,
ācanonisesā: ācanonizesā,
ācanonisingā: ācanonizingā,
ācapitaliseā: ācapitalizeā,
ācapitalisedā: ācapitalizedā,
ācapitalisesā: ācapitalizesā,
ācapitalisingā: ācapitalizingā,
ācarameliseā: ācaramelizeā,
ācaramelisedā: ācaramelizedā,
ācaramelisesā: ācaramelizesā,
ācaramelisingā: ācaramelizingā,
ācarboniseā: ācarbonizeā,
ācarbonisedā: ācarbonizedā,
ācarbonisesā: ācarbonizesā,
ācarbonisingā: ācarbonizingā,
ācarolledā: ācaroledā,
ācarollingā: ācarolingā,
ācatalogueā: ācatalogā,
ācataloguedā: ācatalogedā,
ācataloguesā: ācatalogsā,
ācataloguingā: ācatalogingā,
ācatalyseā: ācatalyzeā,
ācatalysedā: ācatalyzedā,
ācatalysesā: ācatalyzesā,
ācatalysingā: ācatalyzingā,
ācategoriseā: ācategorizeā,
ācategorisedā: ācategorizedā,
ācategorisesā: ācategorizesā,
ācategorisingā: ācategorizingā,
ācauteriseā: ācauterizeā,
ācauterisedā: ācauterizedā,
ācauterisesā: ācauterizesā,
ācauterisingā: ācauterizingā,
ācavilledā: ācaviledā,
ācavillingā: ācavilingā,
ācentigrammeā: ācentigramā,
ācentigrammesā: ācentigramsā,
ācentilitreā: ācentiliterā,
ācentilitresā: ācentilitersā,
ācentimetreā: ācentimeterā,
ācentimetresā: ācentimetersā,
ācentraliseā: ācentralizeā,
ācentralisedā: ācentralizedā,
ācentralisesā: ācentralizesā,
ācentralisingā: ācentralizingā,
ācentreā: ācenterā,
ācentredā: ācenteredā,
ācentrefoldā: ācenterfoldā,
ācentrefoldsā: ācenterfoldsā,
ācentrepieceā: ācenterpieceā,
ācentrepiecesā: ācenterpiecesā,
ācentresā: ācentersā,
āchannelledā: āchanneledā,
āchannellingā: āchannelingā,
ācharacteriseā: ācharacterizeā,
ācharacterisedā: ācharacterizedā,
ācharacterisesā: ācharacterizesā,
ācharacterisingā: ācharacterizingā,
āchequeā: ācheckā,
āchequebookā: ācheckbookā,
āchequebooksā: ācheckbooksā,
āchequeredā: ācheckeredā,
āchequesā: āchecksā,
āchilliā: āchiliā,
āchimaeraā: āchimeraā,
āchimaerasā: āchimerasā,
āchiselledā: āchiseledā,
āchisellingā: āchiselingā,
ācirculariseā: ācircularizeā,
ācircularisedā: ācircularizedā,
ācircularisesā: ācircularizesā,
ācircularisingā: ācircularizingā,
āciviliseā: ācivilizeā,
ācivilisedā: ācivilizedā,
ācivilisesā: ācivilizesā,
ācivilisingā: ācivilizingā,
āclamourā: āclamorā,
āclamouredā: āclamoredā,
āclamouringā: āclamoringā,
āclamoursā: āclamorsā,
āclangourā: āclangorā,
āclarinettistā: āclarinetistā,
āclarinettistsā: āclarinetistsā,
ācollectiviseā: ācollectivizeā,
ācollectivisedā: ācollectivizedā,
ācollectivisesā: ācollectivizesā,
ācollectivisingā: ācollectivizingā,
ācolonisationā: ācolonizationā,
ācoloniseā: ācolonizeā,
ācolonisedā: ācolonizedā,
ācoloniserā: ācolonizerā,
ācolonisersā: ācolonizersā,
ācolonisesā: ācolonizesā,
ācolonisingā: ācolonizingā,
ācolourā: ācolorā,
ācolourantā: ācolorantā,
ācolourantsā: ācolorantsā,
ācolouredā: ācoloredā,
ācolouredsā: ācoloredsā,
ācolourfulā: ācolorfulā,
ācolourfullyā: ācolorfullyā,
ācolouringā: ācoloringā,
ācolourizeā: ācolorizeā,
ācolourizedā: ācolorizedā,
ācolourizesā: ācolorizesā,
ācolourizingā: ācolorizingā,
ācolourlessā: ācolorlessā,
ācoloursā: ācolorsā,
ācommercialiseā: ācommercializeā,
ācommercialisedā: ācommercializedā,
ācommercialisesā: ācommercializesā,
ācommercialisingā: ācommercializingā,
ācompartmentaliseā: ācompartmentalizeā,
ācompartmentalisedā: ācompartmentalizedā,
ācompartmentalisesā: ācompartmentalizesā,
ācompartmentalisingā: ācompartmentalizingā,
ācomputeriseā: ācomputerizeā,
ācomputerisedā: ācomputerizedā,
ācomputerisesā: ācomputerizesā,
ācomputerisingā: ācomputerizingā,
āconceptualiseā: āconceptualizeā,
āconceptualisedā: āconceptualizedā,
āconceptualisesā: āconceptualizesā,
āconceptualisingā: āconceptualizingā,
āconnexionā: āconnectionā,
āconnexionsā: āconnectionsā,
ācontextualiseā: ācontextualizeā,
ācontextualisedā: ācontextualizedā,
ācontextualisesā: ācontextualizesā,
ācontextualisingā: ācontextualizingā,
ācosierā: ācozierā,
ācosiesā: ācoziesā,
ācosiestā: ācoziestā,
ācosilyā: ācozilyā,
ācosinessā: ācozinessā,
ācosyā: ācozyā,
ācouncillorā: ācouncilorā,
ācouncillorsā: ācouncilorsā,
ācounselledā: ācounseledā,
ācounsellingā: ācounselingā,
ācounsellorā: ācounselorā,
ācounsellorsā: ācounselorsā,
ācrenelatedā: ācrenellatedā,
ācriminaliseā: ācriminalizeā,
ācriminalisedā: ācriminalizedā,
ācriminalisesā: ācriminalizesā,
ācriminalisingā: ācriminalizingā,
ācriticiseā: ācriticizeā,
ācriticisedā: ācriticizedā,
ācriticisesā: ācriticizesā,
ācriticisingā: ācriticizingā,
ācruellerā: ācruelerā,
ācruellestā: ācruelestā,
ācrystallisationā: ācrystallizationā,
ācrystalliseā: ācrystallizeā,
ācrystallisedā: ācrystallizedā,
ācrystallisesā: ācrystallizesā,
ācrystallisingā: ācrystallizingā,
ācudgelledā: ācudgeledā,
ācudgellingā: ācudgelingā,
ācustomiseā: ācustomizeā,
ācustomisedā: ācustomizedā,
ācustomisesā: ācustomizesā,
ācustomisingā: ācustomizingā,
ācypherā: ācipherā,
ācyphersā: āciphersā,
ādecentralisationā: ādecentralizationā,
ādecentraliseā: ādecentralizeā,
ādecentralisedā: ādecentralizedā,
ādecentralisesā: ādecentralizesā,
ādecentralisingā: ādecentralizingā,
ādecriminalisationā: ādecriminalizationā,
ādecriminaliseā: ādecriminalizeā,
ādecriminalisedā: ādecriminalizedā,
ādecriminalisesā: ādecriminalizesā,
ādecriminalisingā: ādecriminalizingā,
ādefenceā: ādefenseā,
ādefencelessā: ādefenselessā,
ādefencesā: ādefensesā,
ādehumanisationā: ādehumanizationā,
ādehumaniseā: ādehumanizeā,
ādehumanisedā: ādehumanizedā,
ādehumanisesā: ādehumanizesā,
ādehumanisingā: ādehumanizingā,
ādemeanourā: ādemeanorā,
ādemilitarisationā: ādemilitarizationā,
ādemilitariseā: ādemilitarizeā,
ādemilitarisedā: ādemilitarizedā,
ādemilitarisesā: ādemilitarizesā,
ādemilitarisingā: ādemilitarizingā,
ādemobilisationā: ādemobilizationā,
ādemobiliseā: ādemobilizeā,
ādemobilisedā: ādemobilizedā,
ādemobilisesā: ādemobilizesā,
ādemobilisingā: ādemobilizingā,
ādemocratisationā: ādemocratizationā,
ādemocratiseā: ādemocratizeā,
ādemocratisedā: ādemocratizedā,
ādemocratisesā: ādemocratizesā,
ādemocratisingā: ādemocratizingā,
ādemoniseā: ādemonizeā,
ādemonisedā: ādemonizedā,
ādemonisesā: ādemonizesā,
ādemonisingā: ādemonizingā,
ādemoralisationā: ādemoralizationā,
ādemoraliseā: ādemoralizeā,
ādemoralisedā: ādemoralizedā,
ādemoralisesā: ādemoralizesā,
ādemoralisingā: ādemoralizingā,
ādenationalisationā: ādenationalizationā,
ādenationaliseā: ādenationalizeā,
ādenationalisedā: ādenationalizedā,
ādenationalisesā: ādenationalizesā,
ādenationalisingā: ādenationalizingā,
ādeodoriseā: ādeodorizeā,
ādeodorisedā: ādeodorizedā,
ādeodorisesā: ādeodorizesā,
ādeodorisingā: ādeodorizingā,
ādepersonaliseā: ādepersonalizeā,
ādepersonalisedā: ādepersonalizedā,
ādepersonalisesā: ādepersonalizesā,
ādepersonalisingā: ādepersonalizingā,
ādeputiseā: ādeputizeā,
ādeputisedā: ādeputizedā,
ādeputisesā: ādeputizesā,
ādeputisingā: ādeputizingā,
ādesensitisationā: ādesensitizationā,
ādesensitiseā: ādesensitizeā,
ādesensitisedā: ādesensitizedā,
ādesensitisesā: ādesensitizesā,
ādesensitisingā: ādesensitizingā,
ādestabilisationā: ādestabilizationā,
ādestabiliseā: ādestabilizeā,
ādestabilisedā: ādestabilizedā,
ādestabilisesā: ādestabilizesā,
ādestabilisingā: ādestabilizingā,
ādialledā: ādialedā,
ādiallingā: ādialingā,
ādialogueā: ādialogā,
ādialoguesā: ādialogsā,
ādiarrhoeaā: ādiarrheaā,
ādigitiseā: ādigitizeā,
ādigitisedā: ādigitizedā,
ādigitisesā: ādigitizesā,
ādigitisingā: ādigitizingā,
ādiscā: ādiskā,
ādiscolourā: ādiscolorā,
ādiscolouredā: ādiscoloredā,
ādiscolouringā: ādiscoloringā,
ādiscoloursā: ādiscolorsā,
ādiscsā: ādisksā,
ādisembowelledā: ādisemboweledā,
ādisembowellingā: ādisembowelingā,
ādisfavourā: ādisfavorā,
ādishevelledā: ādisheveledā,
ādishonourā: ādishonorā,
ādishonourableā: ādishonorableā,
ādishonourablyā: ādishonorablyā,
ādishonouredā: ādishonoredā,
ādishonouringā: ādishonoringā,
ādishonoursā: ādishonorsā,
ādisorganisationā: ādisorganizationā,
ādisorganisedā: ādisorganizedā,
ādistilā: ādistillā,
ādistilsā: ādistillsā,
ādramatisationā: ādramatizationā,
ādramatisationsā: ādramatizationsā,
ādramatiseā: ādramatizeā,
ādramatisedā: ādramatizedā,
ādramatisesā: ādramatizesā,
ādramatisingā: ādramatizingā,
ādraughtā: ādraftā,
ādraughtboardā: ādraftboardā,
ādraughtboardsā: ādraftboardsā,
ādraughtierā: ādraftierā,
ādraughtiestā: ādraftiestā,
ādraughtsā: ādraftsā,
ādraughtsmanā: ādraftsmanā,
ādraughtsmanshipā: ādraftsmanshipā,
ādraughtsmenā: ādraftsmenā,
ādraughtswomanā: ādraftswomanā,
ādraughtswomenā: ādraftswomenā,
ādraughtyā: ādraftyā,
ādrivelledā: ādriveledā,
ādrivellingā: ādrivelingā,
āduelledā: ādueledā,
āduellingā: āduelingā,
āeconomiseā: āeconomizeā,
āeconomisedā: āeconomizedā,
āeconomisesā: āeconomizesā,
āeconomisingā: āeconomizingā,
āedoemaā: āedemaā,
āeditorialiseā: āeditorializeā,
āeditorialisedā: āeditorializedā,
āeditorialisesā: āeditorializesā,
āeditorialisingā: āeditorializingā,
āempathiseā: āempathizeā,
āempathisedā: āempathizedā,
āempathisesā: āempathizesā,
āempathisingā: āempathizingā,
āemphasiseā: āemphasizeā,
āemphasisedā: āemphasizedā,
āemphasisesā: āemphasizesā,
āemphasisingā: āemphasizingā,
āenamelledā: āenameledā,
āenamellingā: āenamelingā,
āenamouredā: āenamoredā,
āencyclopaediaā: āencyclopediaā,
āencyclopaediasā: āencyclopediasā,
āencyclopaedicā: āencyclopedicā,
āendeavourā: āendeavorā,
āendeavouredā: āendeavoredā,
āendeavouringā: āendeavoringā,
āendeavoursā: āendeavorsā,
āenergiseā: āenergizeā,
āenergisedā: āenergizedā,
āenergisesā: āenergizesā,
āenergisingā: āenergizingā,
āenrolā: āenrollā,
āenrolsā: āenrollsā,
āenthralā: āenthrallā,
āenthralsā: āenthrallsā,
āepauletteā: āepauletā,
āepaulettesā: āepauletsā,
āepicentreā: āepicenterā,
āepicentresā: āepicentersā,
āepilogueā: āepilogā,
āepiloguesā: āepilogsā,
āepitomiseā: āepitomizeā,
āepitomisedā: āepitomizedā,
āepitomisesā: āepitomizesā,
āepitomisingā: āepitomizingā,
āequalisationā: āequalizationā,
āequaliseā: āequalizeā,
āequalisedā: āequalizedā,
āequaliserā: āequalizerā,
āequalisersā: āequalizersā,
āequalisesā: āequalizesā,
āequalisingā: āequalizingā,
āeulogiseā: āeulogizeā,
āeulogisedā: āeulogizedā,
āeulogisesā: āeulogizesā,
āeulogisingā: āeulogizingā,
āevangeliseā: āevangelizeā,
āevangelisedā: āevangelizedā,
āevangelisesā: āevangelizesā,
āevangelisingā: āevangelizingā,
āexorciseā: āexorcizeā,
āexorcisedā: āexorcizedā,
āexorcisesā: āexorcizesā,
āexorcisingā: āexorcizingā,
āextemporisationā: āextemporizationā,
āextemporiseā: āextemporizeā,
āextemporisedā: āextemporizedā,
āextemporisesā: āextemporizesā,
āextemporisingā: āextemporizingā,
āexternalisationā: āexternalizationā,
āexternalisationsā: āexternalizationsā,
āexternaliseā: āexternalizeā,
āexternalisedā: āexternalizedā,
āexternalisesā: āexternalizesā,
āexternalisingā: āexternalizingā,
āfactoriseā: āfactorizeā,
āfactorisedā: āfactorizedā,
āfactorisesā: āfactorizesā,
āfactorisingā: āfactorizingā,
āfaecalā: āfecalā,
āfaecesā: āfecesā,
āfamiliarisationā: āfamiliarizationā,
āfamiliariseā: āfamiliarizeā,
āfamiliarisedā: āfamiliarizedā,
āfamiliarisesā: āfamiliarizesā,
āfamiliarisingā: āfamiliarizingā,
āfantasiseā: āfantasizeā,
āfantasisedā: āfantasizedā,
āfantasisesā: āfantasizesā,
āfantasisingā: āfantasizingā,
āfavourā: āfavorā,
āfavourableā: āfavorableā,
āfavourablyā: āfavorablyā,
āfavouredā: āfavoredā,
āfavouringā: āfavoringā,
āfavouriteā: āfavoriteā,
āfavouritesā: āfavoritesā,
āfavouritismā: āfavoritismā,
āfavoursā: āfavorsā,
āfeminiseā: āfeminizeā,
āfeminisedā: āfeminizedā,
āfeminisesā: āfeminizesā,
āfeminisingā: āfeminizingā,
āfertilisationā: āfertilizationā,
āfertiliseā: āfertilizeā,
āfertilisedā: āfertilizedā,
āfertiliserā: āfertilizerā,
āfertilisersā: āfertilizersā,
āfertilisesā: āfertilizesā,
āfertilisingā: āfertilizingā,
āfervourā: āfervorā,
āfibreā: āfiberā,
āfibreglassā: āfiberglassā,
āfibresā: āfibersā,
āfictionalisationā: āfictionalizationā,
āfictionalisationsā: āfictionalizationsā,
āfictionaliseā: āfictionalizeā,
āfictionalisedā: āfictionalizedā,
āfictionalisesā: āfictionalizesā,
āfictionalisingā: āfictionalizingā,
āfilletā: āfiletā,
āfilletedā: āfiletedā,
āfilletingā: āfiletingā,
āfilletsā: āfiletsā,
āfinalisationā: āfinalizationā,
āfinaliseā: āfinalizeā,
āfinalisedā: āfinalizedā,
āfinalisesā: āfinalizesā,
āfinalisingā: āfinalizingā,
āflautistā: āflutistā,
āflautistsā: āflutistsā,
āflavourā: āflavorā,
āflavouredā: āflavoredā,
āflavouringā: āflavoringā,
āflavouringsā: āflavoringsā,
āflavourlessā: āflavorlessā,
āflavoursā: āflavorsā,
āflavoursomeā: āflavorsomeā,
āflyer / flierā: āflier / flyerā,
āfoetalā: āfetalā,
āfoetidā: āfetidā,
āfoetusā: āfetusā,
āfoetusesā: āfetusesā,
āformalisationā: āformalizationā,
āformaliseā: āformalizeā,
āformalisedā: āformalizedā,
āformalisesā: āformalizesā,
āformalisingā: āformalizingā,
āfossilisationā: āfossilizationā,
āfossiliseā: āfossilizeā,
āfossilisedā: āfossilizedā,
āfossilisesā: āfossilizesā,
āfossilisingā: āfossilizingā,
āfraternisationā: āfraternizationā,
āfraterniseā: āfraternizeā,
āfraternisedā: āfraternizedā,
āfraternisesā: āfraternizesā,
āfraternisingā: āfraternizingā,
āfulfilā: āfulfillā,
āfulfilmentā: āfulfillmentā,
āfulfilsā: āfulfillsā,
āfunnelledā: āfunneledā,
āfunnellingā: āfunnelingā,
āgalvaniseā: āgalvanizeā,
āgalvanisedā: āgalvanizedā,
āgalvanisesā: āgalvanizesā,
āgalvanisingā: āgalvanizingā,
āgambolledā: āgamboledā,
āgambollingā: āgambolingā,
āgaolā: ājailā,
āgaolbirdā: ājailbirdā,
āgaolbirdsā: ājailbirdsā,
āgaolbreakā: ājailbreakā,
āgaolbreaksā: ājailbreaksā,
āgaoledā: ājailedā,
āgaolerā: ājailerā,
āgaolersā: ājailersā,
āgaolingā: ājailingā,
āgaolsā: ājailsā,
āgassesā: āgasesā,
āgageā: āgaugeā,
āgagedā: āgaugedā,
āgagesā: āgaugesā,
āgagingā: āgaugingā,
āgeneralisationā: āgeneralizationā,
āgeneralisationsā: āgeneralizationsā,
āgeneraliseā: āgeneralizeā,
āgeneralisedā: āgeneralizedā,
āgeneralisesā: āgeneralizesā,
āgeneralisingā: āgeneralizingā,
āghettoiseā: āghettoizeā,
āghettoisedā: āghettoizedā,
āghettoisesā: āghettoizesā,
āghettoisingā: āghettoizingā,
āgipsiesā: āgypsiesā,
āglamoriseā: āglamorizeā,
āglamorisedā: āglamorizedā,
āglamorisesā: āglamorizesā,
āglamorisingā: āglamorizingā,
āglamorā: āglamourā,
āglobalisationā: āglobalizationā,
āglobaliseā: āglobalizeā,
āglobalisedā: āglobalizedā,
āglobalisesā: āglobalizesā,
āglobalisingā: āglobalizingā,
āglueingā: āgluingā,
āgoitreā: āgoiterā,
āgoitresā: āgoitersā,
āgonorrhoeaā: āgonorrheaā,
āgrammeā: āgramā,
āgrammesā: āgramsā,
āgravelledā: āgraveledā,
āgreyā: āgrayā,
āgreyedā: āgrayedā,
āgreyingā: āgrayingā,
āgreyishā: āgrayishā,
āgreynessā: āgraynessā,
āgreysā: āgraysā,
āgrovelledā: āgroveledā,
āgrovellingā: āgrovelingā,
āgroyneā: āgroinā,
āgroynesā: āgroinsā,
āgruellingā: āgruelingā,
āgruellinglyā: āgruelinglyā,
āgryphonā: āgriffinā,
āgryphonsā: āgriffinsā,
āgynaecologicalā: āgynecologicalā,
āgynaecologistā: āgynecologistā,
āgynaecologistsā: āgynecologistsā,
āgynaecologyā: āgynecologyā,
āhaematologicalā: āhematologicalā,
āhaematologistā: āhematologistā,
āhaematologistsā: āhematologistsā,
āhaematologyā: āhematologyā,
āhaemoglobinā: āhemoglobinā,
āhaemophiliaā: āhemophiliaā,
āhaemophiliacā: āhemophiliacā,
āhaemophiliacsā: āhemophiliacsā,
āhaemorrhageā: āhemorrhageā,
āhaemorrhagedā: āhemorrhagedā,
āhaemorrhagesā: āhemorrhagesā,
āhaemorrhagingā: āhemorrhagingā,
āhaemorrhoidsā: āhemorrhoidsā,
āharbourā: āharborā,
āharbouredā: āharboredā,
āharbouringā: āharboringā,
āharboursā: āharborsā,
āharmonisationā: āharmonizationā,
āharmoniseā: āharmonizeā,
āharmonisedā: āharmonizedā,
āharmonisesā: āharmonizesā,
āharmonisingā: āharmonizingā,
āhomoeopathā: āhomeopathā,
āhomoeopathicā: āhomeopathicā,
āhomoeopathsā: āhomeopathsā,
āhomoeopathyā: āhomeopathyā,
āhomogeniseā: āhomogenizeā,
āhomogenisedā: āhomogenizedā,
āhomogenisesā: āhomogenizesā,
āhomogenisingā: āhomogenizingā,
āhonourā: āhonorā,
āhonourableā: āhonorableā,
āhonourablyā: āhonorablyā,
āhonouredā: āhonoredā,
āhonouringā: āhonoringā,
āhonoursā: āhonorsā,
āhospitalisationā: āhospitalizationā,
āhospitaliseā: āhospitalizeā,
āhospitalisedā: āhospitalizedā,
āhospitalisesā: āhospitalizesā,
āhospitalisingā: āhospitalizingā,
āhumaniseā: āhumanizeā,
āhumanisedā: āhumanizedā,
āhumanisesā: āhumanizesā,
āhumanisingā: āhumanizingā,
āhumourā: āhumorā,
āhumouredā: āhumoredā,
āhumouringā: āhumoringā,
āhumourlessā: āhumorlessā,
āhumoursā: āhumorsā,
āhybridiseā: āhybridizeā,
āhybridisedā: āhybridizedā,
āhybridisesā: āhybridizesā,
āhybridisingā: āhybridizingā,
āhypnotiseā: āhypnotizeā,
āhypnotisedā: āhypnotizedā,
āhypnotisesā: āhypnotizesā,
āhypnotisingā: āhypnotizingā,
āhypothesiseā: āhypothesizeā,
āhypothesisedā: āhypothesizedā,
āhypothesisesā: āhypothesizesā,
āhypothesisingā: āhypothesizingā,
āidealisationā: āidealizationā,
āidealiseā: āidealizeā,
āidealisedā: āidealizedā,
āidealisesā: āidealizesā,
āidealisingā: āidealizingā,
āidoliseā: āidolizeā,
āidolisedā: āidolizedā,
āidolisesā: āidolizesā,
āidolisingā: āidolizingā,
āimmobilisationā: āimmobilizationā,
āimmobiliseā: āimmobilizeā,
āimmobilisedā: āimmobilizedā,
āimmobiliserā: āimmobilizerā,
āimmobilisersā: āimmobilizersā,
āimmobilisesā: āimmobilizesā,
āimmobilisingā: āimmobilizingā,
āimmortaliseā: āimmortalizeā,
āimmortalisedā: āimmortalizedā,
āimmortalisesā: āimmortalizesā,
āimmortalisingā: āimmortalizingā,
āimmunisationā: āimmunizationā,
āimmuniseā: āimmunizeā,
āimmunisedā: āimmunizedā,
āimmunisesā: āimmunizesā,
āimmunisingā: āimmunizingā,
āimpanelledā: āimpaneledā,
āimpanellingā: āimpanelingā,
āimperilledā: āimperiledā,
āimperillingā: āimperilingā,
āindividualiseā: āindividualizeā,
āindividualisedā: āindividualizedā,
āindividualisesā: āindividualizesā,
āindividualisingā: āindividualizingā,
āindustrialiseā: āindustrializeā,
āindustrialisedā: āindustrializedā,
āindustrialisesā: āindustrializesā,
āindustrialisingā: āindustrializingā,
āinflexionā: āinflectionā,
āinflexionsā: āinflectionsā,
āinitialiseā: āinitializeā,
āinitialisedā: āinitializedā,
āinitialisesā: āinitializesā,
āinitialisingā: āinitializingā,
āinitialledā: āinitialedā,
āinitiallingā: āinitialingā,
āinstalā: āinstallā,
āinstalmentā: āinstallmentā,
āinstalmentsā: āinstallmentsā,
āinstalsā: āinstallsā,
āinstilā: āinstillā,
āinstilsā: āinstillsā,
āinstitutionalisationā: āinstitutionalizationā,
āinstitutionaliseā: āinstitutionalizeā,
āinstitutionalisedā: āinstitutionalizedā,
āinstitutionalisesā: āinstitutionalizesā,
āinstitutionalisingā: āinstitutionalizingā,
āintellectualiseā: āintellectualizeā,
āintellectualisedā: āintellectualizedā,
āintellectualisesā: āintellectualizesā,
āintellectualisingā: āintellectualizingā,
āinternalisationā: āinternalizationā,
āinternaliseā: āinternalizeā,
āinternalisedā: āinternalizedā,
āinternalisesā: āinternalizesā,
āinternalisingā: āinternalizingā,
āinternationalisationā: āinternationalizationā,
āinternationaliseā: āinternationalizeā,
āinternationalisedā: āinternationalizedā,
āinternationalisesā: āinternationalizesā,
āinternationalisingā: āinternationalizingā,
āionisationā: āionizationā,
āioniseā: āionizeā,
āionisedā: āionizedā,
āioniserā: āionizerā,
āionisersā: āionizersā,
āionisesā: āionizesā,
āionisingā: āionizingā,
āitaliciseā: āitalicizeā,
āitalicisedā: āitalicizedā,
āitalicisesā: āitalicizesā,
āitalicisingā: āitalicizingā,
āitemiseā: āitemizeā,
āitemisedā: āitemizedā,
āitemisesā: āitemizesā,
āitemisingā: āitemizingā,
ājeopardiseā: ājeopardizeā,
ājeopardisedā: ājeopardizedā,
ājeopardisesā: ājeopardizesā,
ājeopardisingā: ājeopardizingā,
ājewelledā: ājeweledā,
ājewellerā: ājewelerā,
ājewellersā: ājewelersā,
ājewelleryā: ājewelryā,
ājudgementā: ājudgmentā,
ākilogrammeā: ākilogramā,
ākilogrammesā: ākilogramsā,
ākilometreā: ākilometerā,
ākilometresā: ākilometersā,
ālabelledā: ālabeledā,
ālabellingā: ālabelingā,
ālabourā: ālaborā,
ālabouredā: ālaboredā,
ālabourerā: ālaborerā,
ālabourersā: ālaborersā,
ālabouringā: ālaboringā,
ālaboursā: ālaborsā,
ālacklustreā: ālacklusterā,
ālegalisationā: ālegalizationā,
ālegaliseā: ālegalizeā,
ālegalisedā: ālegalizedā,
ālegalisesā: ālegalizesā,
ālegalisingā: ālegalizingā,
ālegitimiseā: ālegitimizeā,
ālegitimisedā: ālegitimizedā,
ālegitimisesā: ālegitimizesā,
ālegitimisingā: ālegitimizingā,
āleukaemiaā: āleukemiaā,
ālevelledā: āleveledā,
ālevellerā: ālevelerā,
ālevellersā: ālevelersā,
ālevellingā: ālevelingā,
ālibelledā: ālibeledā,
ālibellingā: ālibelingā,
ālibellousā: ālibelousā,
āliberalisationā: āliberalizationā,
āliberaliseā: āliberalizeā,
āliberalisedā: āliberalizedā,
āliberalisesā: āliberalizesā,
āliberalisingā: āliberalizingā,
ālicenceā: ālicenseā,
ālicencedā: ālicensedā,
ālicencesā: ālicensesā,
ālicencingā: ālicensingā,
ālikeableā: ālikableā,
ālionisationā: ālionizationā,
ālioniseā: ālionizeā,
ālionisedā: ālionizedā,
ālionisesā: ālionizesā,
ālionisingā: ālionizingā,
āliquidiseā: āliquidizeā,
āliquidisedā: āliquidizedā,
āliquidiserā: āliquidizerā,
āliquidisersā: āliquidizersā,
āliquidisesā: āliquidizesā,
āliquidisingā: āliquidizingā,
ālitreā: āliterā,
ālitresā: ālitersā,
ālocaliseā: ālocalizeā,
ālocalisedā: ālocalizedā,
ālocalisesā: ālocalizesā,
ālocalisingā: ālocalizingā,
ālouvreā: ālouverā,
ālouvredā: ālouveredā,
ālouvresā: ālouversā,
ālustreā: ālusterā,
āmagnetiseā: āmagnetizeā,
āmagnetisedā: āmagnetizedā,
āmagnetisesā: āmagnetizesā,
āmagnetisingā: āmagnetizingā,
āmanoeuvrabilityā: āmaneuverabilityā,
āmanoeuvrableā: āmaneuverableā,
āmanoeuvreā: āmaneuverā,
āmanoeuvredā: āmaneuveredā,
āmanoeuvresā: āmaneuversā,
āmanoeuvringā: āmaneuveringā,
āmanoeuvringsā: āmaneuveringsā,
āmarginalisationā: āmarginalizationā,
āmarginaliseā: āmarginalizeā,
āmarginalisedā: āmarginalizedā,
āmarginalisesā: āmarginalizesā,
āmarginalisingā: āmarginalizingā,
āmarshalledā: āmarshaledā,
āmarshallingā: āmarshalingā,
āmarvelledā: āmarveledā,
āmarvellingā: āmarvelingā,
āmarvellousā: āmarvelousā,
āmarvellouslyā: āmarvelouslyā,
āmaterialisationā: āmaterializationā,
āmaterialiseā: āmaterializeā,
āmaterialisedā: āmaterializedā,
āmaterialisesā: āmaterializesā,
āmaterialisingā: āmaterializingā,
āmaximisationā: āmaximizationā,
āmaximiseā: āmaximizeā,
āmaximisedā: āmaximizedā,
āmaximisesā: āmaximizesā,
āmaximisingā: āmaximizingā,
āmeagreā: āmeagerā,
āmechanisationā: āmechanizationā,
āmechaniseā: āmechanizeā,
āmechanisedā: āmechanizedā,
āmechanisesā: āmechanizesā,
āmechanisingā: āmechanizingā,
āmediaevalā: āmedievalā,
āmemorialiseā: āmemorializeā,
āmemorialisedā: āmemorializedā,
āmemorialisesā: āmemorializesā,
āmemorialisingā: āmemorializingā,
āmemoriseā: āmemorizeā,
āmemorisedā: āmemorizedā,
āmemorisesā: āmemorizesā,
āmemorisingā: āmemorizingā,
āmesmeriseā: āmesmerizeā,
āmesmerisedā: āmesmerizedā,
āmesmerisesā: āmesmerizesā,
āmesmerisingā: āmesmerizingā,
āmetaboliseā: āmetabolizeā,
āmetabolisedā: āmetabolizedā,
āmetabolisesā: āmetabolizesā,
āmetabolisingā: āmetabolizingā,
āmetreā: āmeterā,
āmetresā: āmetersā,
āmicrometreā: āmicrometerā,
āmicrometresā: āmicrometersā,
āmilitariseā: āmilitarizeā,
āmilitarisedā: āmilitarizedā,
āmilitarisesā: āmilitarizesā,
āmilitarisingā: āmilitarizingā,
āmilligrammeā: āmilligramā,
āmilligrammesā: āmilligramsā,
āmillilitreā: āmilliliterā,
āmillilitresā: āmillilitersā,
āmillimetreā: āmillimeterā,
āmillimetresā: āmillimetersā,
āminiaturisationā: āminiaturizationā,
āminiaturiseā: āminiaturizeā,
āminiaturisedā: āminiaturizedā,
āminiaturisesā: āminiaturizesā,
āminiaturisingā: āminiaturizingā,
āminibussesā: āminibusesā,
āminimiseā: āminimizeā,
āminimisedā: āminimizedā,
āminimisesā: āminimizesā,
āminimisingā: āminimizingā,
āmisbehaviourā: āmisbehaviorā,
āmisdemeanourā: āmisdemeanorā,
āmisdemeanoursā: āmisdemeanorsā,
āmisspeltā: āmisspelledā,
āmitreā: āmiterā,
āmitresā: āmitersā,
āmobilisationā: āmobilizationā,
āmobiliseā: āmobilizeā,
āmobilisedā: āmobilizedā,
āmobilisesā: āmobilizesā,
āmobilisingā: āmobilizingā,
āmodelledā: āmodeledā,
āmodellerā: āmodelerā,
āmodellersā: āmodelersā,
āmodellingā: āmodelingā,
āmoderniseā: āmodernizeā,
āmodernisedā: āmodernizedā,
āmodernisesā: āmodernizesā,
āmodernisingā: āmodernizingā,
āmoisturiseā: āmoisturizeā,
āmoisturisedā: āmoisturizedā,
āmoisturiserā: āmoisturizerā,
āmoisturisersā: āmoisturizersā,
āmoisturisesā: āmoisturizesā,
āmoisturisingā: āmoisturizingā,
āmonologueā: āmonologā,
āmonologuesā: āmonologsā,
āmonopolisationā: āmonopolizationā,
āmonopoliseā: āmonopolizeā,
āmonopolisedā: āmonopolizedā,
āmonopolisesā: āmonopolizesā,
āmonopolisingā: āmonopolizingā,
āmoraliseā: āmoralizeā,
āmoralisedā: āmoralizedā,
āmoralisesā: āmoralizesā,
āmoralisingā: āmoralizingā,
āmotorisedā: āmotorizedā,
āmouldā: āmoldā,
āmouldedā: āmoldedā,
āmoulderā: āmolderā,
āmoulderedā: āmolderedā,
āmoulderingā: āmolderingā,
āmouldersā: āmoldersā,
āmouldierā: āmoldierā,
āmouldiestā: āmoldiestā,
āmouldingā: āmoldingā,
āmouldingsā: āmoldingsā,
āmouldsā: āmoldsā,
āmouldyā: āmoldyā,
āmoultā: āmoltā,
āmoultedā: āmoltedā,
āmoultingā: āmoltingā,
āmoultsā: āmoltsā,
āmoustacheā: āmustacheā,
āmoustachedā: āmustachedā,
āmoustachesā: āmustachesā,
āmoustachioedā: āmustachioedā,
āmulticolouredā: āmulticoloredā,
ānationalisationā: ānationalizationā,
ānationalisationsā: ānationalizationsā,
ānationaliseā: ānationalizeā,
ānationalisedā: ānationalizedā,
ānationalisesā: ānationalizesā,
ānationalisingā: ānationalizingā,
ānaturalisationā: ānaturalizationā,
ānaturaliseā: ānaturalizeā,
ānaturalisedā: ānaturalizedā,
ānaturalisesā: ānaturalizesā,
ānaturalisingā: ānaturalizingā,
āneighbourā: āneighborā,
āneighbourhoodā: āneighborhoodā,
āneighbourhoodsā: āneighborhoodsā,
āneighbouringā: āneighboringā,
āneighbourlinessā: āneighborlinessā,
āneighbourlyā: āneighborlyā,
āneighboursā: āneighborsā,
āneutralisationā: āneutralizationā,
āneutraliseā: āneutralizeā,
āneutralisedā: āneutralizedā,
āneutralisesā: āneutralizesā,
āneutralisingā: āneutralizingā,
ānormalisationā: ānormalizationā,
ānormaliseā: ānormalizeā,
ānormalisedā: ānormalizedā,
ānormalisesā: ānormalizesā,
ānormalisingā: ānormalizingā,
āodourā: āodorā,
āodourlessā: āodorlessā,
āodoursā: āodorsā,
āoesophagusā: āesophagusā,
āoesophagusesā: āesophagusesā,
āoestrogenā: āestrogenā,
āoffenceā: āoffenseā,
āoffencesā: āoffensesā,
āomeletteā: āomeletā,
āomelettesā: āomeletsā,
āoptimiseā: āoptimizeā,
āoptimisedā: āoptimizedā,
āoptimisesā: āoptimizesā,
āoptimisingā: āoptimizingā,
āorganisationā: āorganizationā,
āorganisationalā: āorganizationalā,
āorganisationsā: āorganizationsā,
āorganiseā: āorganizeā,
āorganisedā: āorganizedā,
āorganiserā: āorganizerā,
āorganisersā: āorganizersā,
āorganisesā: āorganizesā,
āorganisingā: āorganizingā,
āorthopaedicā: āorthopedicā,
āorthopaedicsā: āorthopedicsā,
āostraciseā: āostracizeā,
āostracisedā: āostracizedā,
āostracisesā: āostracizesā,
āostracisingā: āostracizingā,
āoutmanoeuvreā: āoutmaneuverā,
āoutmanoeuvredā: āoutmaneuveredā,
āoutmanoeuvresā: āoutmaneuversā,
āoutmanoeuvringā: āoutmaneuveringā,
āoveremphasiseā: āoveremphasizeā,
āoveremphasisedā: āoveremphasizedā,
āoveremphasisesā: āoveremphasizesā,
āoveremphasisingā: āoveremphasizingā,
āoxidisationā: āoxidizationā,
āoxidiseā: āoxidizeā,
āoxidisedā: āoxidizedā,
āoxidisesā: āoxidizesā,
āoxidisingā: āoxidizingā,
āpaederastā: āpederastā,
āpaederastsā: āpederastsā,
āpaediatricā: āpediatricā,
āpaediatricianā: āpediatricianā,
āpaediatriciansā: āpediatriciansā,
āpaediatricsā: āpediatricsā,
āpaedophileā: āpedophileā,
āpaedophilesā: āpedophilesā,
āpaedophiliaā: āpedophiliaā,
āpalaeolithicā: āpaleolithicā,
āpalaeontologistā: āpaleontologistā,
āpalaeontologistsā: āpaleontologistsā,
āpalaeontologyā: āpaleontologyā,
āpanelledā: āpaneledā,
āpanellingā: āpanelingā,
āpanellistā: āpanelistā,
āpanellistsā: āpanelistsā,
āparalyseā: āparalyzeā,
āparalysedā: āparalyzedā,
āparalysesā: āparalyzesā,
āparalysingā: āparalyzingā,
āparcelledā: āparceledā,
āparcellingā: āparcelingā,
āparlourā: āparlorā,
āparloursā: āparlorsā,
āparticulariseā: āparticularizeā,
āparticularisedā: āparticularizedā,
āparticularisesā: āparticularizesā,
āparticularisingā: āparticularizingā,
āpassivisationā: āpassivizationā,
āpassiviseā: āpassivizeā,
āpassivisedā: āpassivizedā,
āpassivisesā: āpassivizesā,
āpassivisingā: āpassivizingā,
āpasteurisationā: āpasteurizationā,
āpasteuriseā: āpasteurizeā,
āpasteurisedā: āpasteurizedā,
āpasteurisesā: āpasteurizesā,
āpasteurisingā: āpasteurizingā,
āpatroniseā: āpatronizeā,
āpatronisedā: āpatronizedā,
āpatronisesā: āpatronizesā,
āpatronisingā: āpatronizingā,
āpatronisinglyā: āpatronizinglyā,
āpedalledā: āpedaledā,
āpedallingā: āpedalingā,
āpedestrianisationā: āpedestrianizationā,
āpedestrianiseā: āpedestrianizeā,
āpedestrianisedā: āpedestrianizedā,
āpedestrianisesā: āpedestrianizesā,
āpedestrianisingā: āpedestrianizingā,
āpenaliseā: āpenalizeā,
āpenalisedā: āpenalizedā,
āpenalisesā: āpenalizesā,
āpenalisingā: āpenalizingā,
āpencilledā: āpenciledā,
āpencillingā: āpencilingā,
āpersonaliseā: āpersonalizeā,
āpersonalisedā: āpersonalizedā,
āpersonalisesā: āpersonalizesā,
āpersonalisingā: āpersonalizingā,
āpharmacopoeiaā: āpharmacopeiaā,
āpharmacopoeiasā: āpharmacopeiasā,
āphilosophiseā: āphilosophizeā,
āphilosophisedā: āphilosophizedā,
āphilosophisesā: āphilosophizesā,
āphilosophisingā: āphilosophizingā,
āphiltreā: āfilterā,
āphiltresā: āfiltersā,
āphoneyā: āphonyā,
āplagiariseā: āplagiarizeā,
āplagiarisedā: āplagiarizedā,
āplagiarisesā: āplagiarizesā,
āplagiarisingā: āplagiarizingā,
āploughā: āplowā,
āploughedā: āplowedā,
āploughingā: āplowingā,
āploughmanā: āplowmanā,
āploughmenā: āplowmenā,
āploughsā: āplowsā,
āploughshareā: āplowshareā,
āploughsharesā: āplowsharesā,
āpolarisationā: āpolarizationā,
āpolariseā: āpolarizeā,
āpolarisedā: āpolarizedā,
āpolarisesā: āpolarizesā,
āpolarisingā: āpolarizingā,
āpoliticisationā: āpoliticizationā,
āpoliticiseā: āpoliticizeā,
āpoliticisedā: āpoliticizedā,
āpoliticisesā: āpoliticizesā,
āpoliticisingā: āpoliticizingā,
āpopularisationā: āpopularizationā,
āpopulariseā: āpopularizeā,
āpopularisedā: āpopularizedā,
āpopularisesā: āpopularizesā,
āpopularisingā: āpopularizingā,
āpouffeā: āpoufā,
āpouffesā: āpoufsā,
āpractiseā: āpracticeā,
āpractisedā: āpracticedā,
āpractisesā: āpracticesā,
āpractisingā: āpracticingā,
āpraesidiumā: āpresidiumā,
āpraesidiumsā: āpresidiumsā,
āpressurisationā: āpressurizationā,
āpressuriseā: āpressurizeā,
āpressurisedā: āpressurizedā,
āpressurisesā: āpressurizesā,
āpressurisingā: āpressurizingā,
āpretenceā: āpretenseā,
āpretencesā: āpretensesā,
āprimaevalā: āprimevalā,
āprioritisationā: āprioritizationā,
āprioritiseā: āprioritizeā,
āprioritisedā: āprioritizedā,
āprioritisesā: āprioritizesā,
āprioritisingā: āprioritizingā,
āprivatisationā: āprivatizationā,
āprivatisationsā: āprivatizationsā,
āprivatiseā: āprivatizeā,
āprivatisedā: āprivatizedā,
āprivatisesā: āprivatizesā,
āprivatisingā: āprivatizingā,
āprofessionalisationā: āprofessionalizationā,
āprofessionaliseā: āprofessionalizeā,
āprofessionalisedā: āprofessionalizedā,
āprofessionalisesā: āprofessionalizesā,
āprofessionalisingā: āprofessionalizingā,
āprogrammeā: āprogramā,
āprogrammesā: āprogramsā,
āprologueā: āprologā,
āprologuesā: āprologsā,
āpropagandiseā: āpropagandizeā,
āpropagandisedā: āpropagandizedā,
āpropagandisesā: āpropagandizesā,
āpropagandisingā: āpropagandizingā,
āproselytiseā: āproselytizeā,
āproselytisedā: āproselytizedā,
āproselytiserā: āproselytizerā,
āproselytisersā: āproselytizersā,
āproselytisesā: āproselytizesā,
āproselytisingā: āproselytizingā,
āpsychoanalyseā: āpsychoanalyzeā,
āpsychoanalysedā: āpsychoanalyzedā,
āpsychoanalysesā: āpsychoanalyzesā,
āpsychoanalysingā: āpsychoanalyzingā,
āpubliciseā: āpublicizeā,
āpublicisedā: āpublicizedā,
āpublicisesā: āpublicizesā,
āpublicisingā: āpublicizingā,
āpulverisationā: āpulverizationā,
āpulveriseā: āpulverizeā,
āpulverisedā: āpulverizedā,
āpulverisesā: āpulverizesā,
āpulverisingā: āpulverizingā,
āpummelledā: āpummelā,
āpummellingā: āpummeledā,
āpyjamaā: āpajamaā,
āpyjamasā: āpajamasā,
āpzazzā: āpizzazzā,
āquarrelledā: āquarreledā,
āquarrellingā: āquarrelingā,
āradicaliseā: āradicalizeā,
āradicalisedā: āradicalizedā,
āradicalisesā: āradicalizesā,
āradicalisingā: āradicalizingā,
ārancourā: ārancorā,
ārandomiseā: ārandomizeā,
ārandomisedā: ārandomizedā,
ārandomisesā: ārandomizesā,
ārandomisingā: ārandomizingā,
ārationalisationā: ārationalizationā,
ārationalisationsā: ārationalizationsā,
ārationaliseā: ārationalizeā,
ārationalisedā: ārationalizedā,
ārationalisesā: ārationalizesā,
ārationalisingā: ārationalizingā,
āravelledā: āraveledā,
āravellingā: āravelingā,
ārealisableā: ārealizableā,
ārealisationā: ārealizationā,
ārealisationsā: ārealizationsā,
ārealiseā: ārealizeā,
ārealisedā: ārealizedā,
ārealisesā: ārealizesā,
ārealisingā: ārealizingā,
ārecognisableā: ārecognizableā,
ārecognisablyā: ārecognizablyā,
ārecognisanceā: ārecognizanceā,
ārecogniseā: ārecognizeā,
ārecognisedā: ārecognizedā,
ārecognisesā: ārecognizesā,
ārecognisingā: ārecognizingā,
āreconnoitreā: āreconnoiterā,
āreconnoitredā: āreconnoiteredā,
āreconnoitresā: āreconnoitersā,
āreconnoitringā: āreconnoiteringā,
ārefuelledā: ārefueledā,
ārefuellingā: ārefuelingā,
āregularisationā: āregularizationā,
āregulariseā: āregularizeā,
āregularisedā: āregularizedā,
āregularisesā: āregularizesā,
āregularisingā: āregularizingā,
āremodelledā: āremodeledā,
āremodellingā: āremodelingā,
āremouldā: āremoldā,
āremouldedā: āremoldedā,
āremouldingā: āremoldingā,
āremouldsā: āremoldsā,
āreorganisationā: āreorganizationā,
āreorganisationsā: āreorganizationsā,
āreorganiseā: āreorganizeā,
āreorganisedā: āreorganizedā,
āreorganisesā: āreorganizesā,
āreorganisingā: āreorganizingā,
ārevelledā: āreveledā,
ārevellerā: ārevelerā,
ārevellersā: ārevelersā,
ārevellingā: ārevelingā,
ārevitaliseā: ārevitalizeā,
ārevitalisedā: ārevitalizedā,
ārevitalisesā: ārevitalizesā,
ārevitalisingā: ārevitalizingā,
ārevolutioniseā: ārevolutionizeā,
ārevolutionisedā: ārevolutionizedā,
ārevolutionisesā: ārevolutionizesā,
ārevolutionisingā: ārevolutionizingā,
ārhapsodiseā: ārhapsodizeā,
ārhapsodisedā: ārhapsodizedā,
ārhapsodisesā: ārhapsodizesā,
ārhapsodisingā: ārhapsodizingā,
ārigourā: ārigorā,
ārigoursā: ārigorsā,
āritualisedā: āritualizedā,
ārivalledā: ārivaledā,
ārivallingā: ārivalingā,
āromanticiseā: āromanticizeā,
āromanticisedā: āromanticizedā,
āromanticisesā: āromanticizesā,
āromanticisingā: āromanticizingā,
ārumourā: ārumorā,
ārumouredā: ārumoredā,
ārumoursā: ārumorsā,
āsabreā: āsaberā,
āsabresā: āsabersā,
āsaltpetreā: āsaltpeterā,
āsanitiseā: āsanitizeā,
āsanitisedā: āsanitizedā,
āsanitisesā: āsanitizesā,
āsanitisingā: āsanitizingā,
āsatiriseā: āsatirizeā,
āsatirisedā: āsatirizedā,
āsatirisesā: āsatirizesā,
āsatirisingā: āsatirizingā,
āsaviourā: āsaviorā,
āsavioursā: āsaviorsā,
āsavourā: āsavorā,
āsavouredā: āsavoredā,
āsavouriesā: āsavoriesā,
āsavouringā: āsavoringā,
āsavoursā: āsavorsā,
āsavouryā: āsavoryā,
āscandaliseā: āscandalizeā,
āscandalisedā: āscandalizedā,
āscandalisesā: āscandalizesā,
āscandalisingā: āscandalizingā,
āscepticā: āskepticā,
āscepticalā: āskepticalā,
āscepticallyā: āskepticallyā,
āscepticismā: āskepticismā,
āscepticsā: āskepticsā,
āsceptreā: āscepterā,
āsceptresā: āsceptersā,
āscrutiniseā: āscrutinizeā,
āscrutinisedā: āscrutinizedā,
āscrutinisesā: āscrutinizesā,
āscrutinisingā: āscrutinizingā,
āsecularisationā: āsecularizationā,
āseculariseā: āsecularizeā,
āsecularisedā: āsecularizedā,
āsecularisesā: āsecularizesā,
āsecularisingā: āsecularizingā,
āsensationaliseā: āsensationalizeā,
āsensationalisedā: āsensationalizedā,
āsensationalisesā: āsensationalizesā,
āsensationalisingā: āsensationalizingā,
āsensitiseā: āsensitizeā,
āsensitisedā: āsensitizedā,
āsensitisesā: āsensitizesā,
āsensitisingā: āsensitizingā,
āsentimentaliseā: āsentimentalizeā,
āsentimentalisedā: āsentimentalizedā,
āsentimentalisesā: āsentimentalizesā,
āsentimentalisingā: āsentimentalizingā,
āsepulchreā: āsepulcherā,
āsepulchresā: āsepulchersā,
āserialisationā: āserializationā,
āserialisationsā: āserializationsā,
āserialiseā: āserializeā,
āserialisedā: āserializedā,
āserialisesā: āserializesā,
āserialisingā: āserializingā,
āsermoniseā: āsermonizeā,
āsermonisedā: āsermonizedā,
āsermonisesā: āsermonizesā,
āsermonisingā: āsermonizingā,
āsheikhā: āsheikā,
āshovelledā: āshoveledā,
āshovellingā: āshovelingā,
āshrivelledā: āshriveledā,
āshrivellingā: āshrivelingā,
āsignaliseā: āsignalizeā,
āsignalisedā: āsignalizedā,
āsignalisesā: āsignalizesā,
āsignalisingā: āsignalizingā,
āsignalledā: āsignaledā,
āsignallingā: āsignalingā,
āsmoulderā: āsmolderā,
āsmoulderedā: āsmolderedā,
āsmoulderingā: āsmolderingā,
āsmouldersā: āsmoldersā,
āsnivelledā: āsniveledā,
āsnivellingā: āsnivelingā,
āsnorkelledā: āsnorkeledā,
āsnorkellingā: āsnorkelingā,
āsnowploughā: āsnowplowā,
āsnowploughsā: āsnowplowā,
āsocialisationā: āsocializationā,
āsocialiseā: āsocializeā,
āsocialisedā: āsocializedā,
āsocialisesā: āsocializesā,
āsocialisingā: āsocializingā,
āsodomiseā: āsodomizeā,
āsodomisedā: āsodomizedā,
āsodomisesā: āsodomizesā,
āsodomisingā: āsodomizingā,
āsolemniseā: āsolemnizeā,
āsolemnisedā: āsolemnizedā,
āsolemnisesā: āsolemnizesā,
āsolemnisingā: āsolemnizingā,
āsombreā: āsomberā,
āspecialisationā: āspecializationā,
āspecialisationsā: āspecializationsā,
āspecialiseā: āspecializeā,
āspecialisedā: āspecializedā,
āspecialisesā: āspecializesā,
āspecialisingā: āspecializingā,
āspectreā: āspecterā,
āspectresā: āspectersā,
āspiralledā: āspiraledā,
āspirallingā: āspiralingā,
āsplendourā: āsplendorā,
āsplendoursā: āsplendorsā,
āsquirrelledā: āsquirreledā,
āsquirrellingā: āsquirrelingā,
āstabilisationā: āstabilizationā,
āstabiliseā: āstabilizeā,
āstabilisedā: āstabilizedā,
āstabiliserā: āstabilizerā,
āstabilisersā: āstabilizersā,
āstabilisesā: āstabilizesā,
āstabilisingā: āstabilizingā,
āstandardisationā: āstandardizationā,
āstandardiseā: āstandardizeā,
āstandardisedā: āstandardizedā,
āstandardisesā: āstandardizesā,
āstandardisingā: āstandardizingā,
āstencilledā: āstenciledā,
āstencillingā: āstencilingā,
āsterilisationā: āsterilizationā,
āsterilisationsā: āsterilizationsā,
āsteriliseā: āsterilizeā,
āsterilisedā: āsterilizedā,
āsteriliserā: āsterilizerā,
āsterilisersā: āsterilizersā,
āsterilisesā: āsterilizesā,
āsterilisingā: āsterilizingā,
āstigmatisationā: āstigmatizationā,
āstigmatiseā: āstigmatizeā,
āstigmatisedā: āstigmatizedā,
āstigmatisesā: āstigmatizesā,
āstigmatisingā: āstigmatizingā,
āstoreyā: āstoryā,
āstoreysā: āstoriesā,
āsubsidisationā: āsubsidizationā,
āsubsidiseā: āsubsidizeā,
āsubsidisedā: āsubsidizedā,
āsubsidiserā: āsubsidizerā,
āsubsidisersā: āsubsidizersā,
āsubsidisesā: āsubsidizesā,
āsubsidisingā: āsubsidizingā,
āsuccourā: āsuccorā,
āsuccouredā: āsuccoredā,
āsuccouringā: āsuccoringā,
āsuccoursā: āsuccorsā,
āsulphateā: āsulfateā,
āsulphatesā: āsulfatesā,
āsulphideā: āsulfideā,
āsulphidesā: āsulfidesā,
āsulphurā: āsulfurā,
āsulphurousā: āsulfurousā,
āsummariseā: āsummarizeā,
āsummarisedā: āsummarizedā,
āsummarisesā: āsummarizesā,
āsummarisingā: āsummarizingā,
āswivelledā: āswiveledā,
āswivellingā: āswivelingā,
āsymboliseā: āsymbolizeā,
āsymbolisedā: āsymbolizedā,
āsymbolisesā: āsymbolizesā,
āsymbolisingā: āsymbolizingā,
āsympathiseā: āsympathizeā,
āsympathisedā: āsympathizedā,
āsympathiserā: āsympathizerā,
āsympathisersā: āsympathizersā,
āsympathisesā: āsympathizesā,
āsympathisingā: āsympathizingā,
āsynchronisationā: āsynchronizationā,
āsynchroniseā: āsynchronizeā,
āsynchronisedā: āsynchronizedā,
āsynchronisesā: āsynchronizesā,
āsynchronisingā: āsynchronizingā,
āsynthesiseā: āsynthesizeā,
āsynthesisedā: āsynthesizedā,
āsynthesiserā: āsynthesizerā,
āsynthesisersā: āsynthesizersā,
āsynthesisesā: āsynthesizesā,
āsynthesisingā: āsynthesizingā,
āsyphonā: āsiphonā,
āsyphonedā: āsiphonedā,
āsyphoningā: āsiphoningā,
āsyphonsā: āsiphonsā,
āsystematisationā: āsystematizationā,
āsystematiseā: āsystematizeā,
āsystematisedā: āsystematizedā,
āsystematisesā: āsystematizesā,
āsystematisingā: āsystematizingā,
ātantaliseā: ātantalizeā,
ātantalisedā: ātantalizedā,
ātantalisesā: ātantalizesā,
ātantalisingā: ātantalizingā,
ātantalisinglyā: ātantalizinglyā,
ātasselledā: ātasseledā,
ātechnicolourā: ātechnicolorā,
ātemporiseā: ātemporizeā,
ātemporisedā: ātemporizedā,
ātemporisesā: ātemporizesā,
ātemporisingā: ātemporizingā,
ātenderiseā: ātenderizeā,
ātenderisedā: ātenderizedā,
ātenderisesā: ātenderizesā,
ātenderisingā: ātenderizingā,
āterroriseā: āterrorizeā,
āterrorisedā: āterrorizedā,
āterrorisesā: āterrorizesā,
āterrorisingā: āterrorizingā,
ātheatreā: ātheaterā,
ātheatregoerā: ātheatergoerā,
ātheatregoersā: ātheatergoersā,
ātheatresā: ātheatersā,
ātheoriseā: ātheorizeā,
ātheorisedā: ātheorizedā,
ātheorisesā: ātheorizesā,
ātheorisingā: ātheorizingā,
ātonneā: ātonā,
ātonnesā: ātonsā,
ātowelledā: ātoweledā,
ātowellingā: ātowelingā,
ātoxaemiaā: ātoxemiaā,
ātranquilliseā: ātranquilizeā,
ātranquillisedā: ātranquilizedā,
ātranquilliserā: ātranquilizerā,
ātranquillisersā: ātranquilizersā,
ātranquillisesā: ātranquilizesā,
ātranquillisingā: ātranquilizingā,
ātranquillityā: ātranquilityā,
ātranquillizeā: ātranquilizeā,
ātranquillizedā: ātranquilizedā,
ātranquillizerā: ātranquilizerā,
ātranquillizersā: ātranquilizersā,
ātranquillizesā: ātranquilizesā,
ātranquillizingā: ātranquilizingā,
ātranquillyā: ātranquilityā,
ātransistorisedā: ātransistorizedā,
ātraumatiseā: ātraumatizeā,
ātraumatisedā: ātraumatizedā,
ātraumatisesā: ātraumatizesā,
ātraumatisingā: ātraumatizingā,
ātravelledā: ātraveledā,
ātravellerā: ātravelerā,
ātravellersā: ātravelersā,
ātravellingā: ātravelingā,
ātravelogā: ātravelogueā,
ātravelogsā: ātraveloguesā,
ātrialledā: ātrialedā,
ātriallingā: ātrialingā,
ātricolourā: ātricolorā,
ātricoloursā: ātricolorsā,
ātrivialiseā: ātrivializeā,
ātrivialisedā: ātrivializedā,
ātrivialisesā: ātrivializesā,
ātrivialisingā: ātrivializingā,
ātumourā: ātumorā,
ātumoursā: ātumorsā,
ātunnelledā: ātunneledā,
ātunnellingā: ātunnelingā,
ātyranniseā: ātyrannizeā,
ātyrannisedā: ātyrannizedā,
ātyrannisesā: ātyrannizesā,
ātyrannisingā: ātyrannizingā,
ātyreā: ātireā,
ātyresā: ātiresā,
āunauthorisedā: āunauthorizedā,
āuncivilisedā: āuncivilizedā,
āunderutilisedā: āunderutilizedā,
āunequalledā: āunequaledā,
āunfavourableā: āunfavorableā,
āunfavourablyā: āunfavorablyā,
āunionisationā: āunionizationā,
āunioniseā: āunionizeā,
āunionisedā: āunionizedā,
āunionisesā: āunionizesā,
āunionisingā: āunionizingā,
āunorganisedā: āunorganizedā,
āunravelledā: āunraveledā,
āunravellingā: āunravelingā,
āunrecognisableā: āunrecognizableā,
āunrecognisedā: āunrecognizedā,
āunrivalledā: āunrivaledā,
āunsavouryā: āunsavoryā,
āuntrammelledā: āuntrammeledā,
āurbanisationā: āurbanizationā,
āurbaniseā: āurbanizeā,
āurbanisedā: āurbanizedā,
āurbanisesā: āurbanizesā,
āurbanisingā: āurbanizingā,
āutilisableā: āutilizableā,
āutilisationā: āutilizationā,
āutiliseā: āutilizeā,
āutilisedā: āutilizedā,
āutilisesā: āutilizesā,
āutilisingā: āutilizingā,
āvalourā: āvalorā,
āvandaliseā: āvandalizeā,
āvandalisedā: āvandalizedā,
āvandalisesā: āvandalizesā,
āvandalisingā: āvandalizingā,
āvaporisationā: āvaporizationā,
āvaporiseā: āvaporizeā,
āvaporisedā: āvaporizedā,
āvaporisesā: āvaporizesā,
āvaporisingā: āvaporizingā,
āvapourā: āvaporā,
āvapoursā: āvaporsā,
āverbaliseā: āverbalizeā,
āverbalisedā: āverbalizedā,
āverbalisesā: āverbalizesā,
āverbalisingā: āverbalizingā,
āvictimisationā: āvictimizationā,
āvictimiseā: āvictimizeā,
āvictimisedā: āvictimizedā,
āvictimisesā: āvictimizesā,
āvictimisingā: āvictimizingā,
āvideodiscā: āvideodiskā,
āvideodiscsā: āvideodisksā,
āvigourā: āvigorā,
āvisualisationā: āvisualizationā,
āvisualisationsā: āvisualizationsā,
āvisualiseā: āvisualizeā,
āvisualisedā: āvisualizedā,
āvisualisesā: āvisualizesā,
āvisualisingā: āvisualizingā,
āvocalisationā: āvocalizationā,
āvocalisationsā: āvocalizationsā,
āvocaliseā: āvocalizeā,
āvocalisedā: āvocalizedā,
āvocalisesā: āvocalizesā,
āvocalisingā: āvocalizingā,
āvulcanisedā: āvulcanizedā,
āvulgarisationā: āvulgarizationā,
āvulgariseā: āvulgarizeā,
āvulgarisedā: āvulgarizedā,
āvulgarisesā: āvulgarizesā,
āvulgarisingā: āvulgarizingā,
āwaggonā: āwagonā,
āwaggonsā: āwagonsā,
āwatercolourā: āwatercolorā,
āwatercoloursā: āwatercolorsā,
āweaselledā: āweaseledā,
āweasellingā: āweaselingā,
āwesternisationā: āwesternizationā,
āwesterniseā: āwesternizeā,
āwesternisedā: āwesternizedā,
āwesternisesā: āwesternizesā,
āwesternisingā: āwesternizingā,
āwomaniseā: āwomanizeā,
āwomanisedā: āwomanizedā,
āwomaniserā: āwomanizerā,
āwomanisersā: āwomanizersā,
āwomanisesā: āwomanizesā,
āwomanisingā: āwomanizingā,
āwoollenā: āwoolenā,
āwoollensā: āwoolensā,
āwoolliesā: āwooliesā,
āwoollyā: āwoolyā,
āworshippedā: āworshipedā,
āworshippingā: āworshipingā,
āworshipperā: āworshiperā,
āyodelledā: āyodeledā,
āyodellingā: āyodelingā,
āyoghourtā: āyogurtā,
āyoghourtsā: āyogurtsā,
āyoghurtā: āyogurtā,
āyoghurtsā: āyogurtsā,
āmhmā: āhmmā,
āmmmā: āhmmā
}
./utils/preprocess_text.py
import re
import unicodedata
from zhconv import convert
from typing import List
from .english import EnglishTextNormalizer
from .basic import BasicTextNormalizer
def read_transcription(folder):
import csv
second_column_data = []
with open(folder, ārā, encoding=ālatin-1ā) as file:
reader = csv.reader(file)
for row in reader:
if len(row) > 1: # ē”®äæč³å°ęäø¤åę°ę®
second_column_data.append(row[1])
return second_column_data
def normalize_chinese(text):
punctuation = ā!,.;:?ćļ¼ļ¼ćļ¼ļ¼ļ¼ā
if isinstance(text, str):
text = re.sub(rā[{}]+ā.format(punctuation), āā, text).strip() # å é¤ę ē¹ē¬¦å·
text = text.replace(ā ā, āā) # å é¤ē©ŗē½
text = convert(text, āzh-cnā) # ē¹ä½č½¬ē®ä½
return text
elif isinstance(text, list):
result_text = []
for t in text:
t = re.sub(rā[{}]+ā.format(punctuation), āā, t).strip() # å é¤ę ē¹ē¬¦å·
text = text.replace(ā ā, āā) # å é¤ē©ŗē½
text = convert(text, āzh-cnā) # ē¹ä½č½¬ē®ä½
result_text.append(t)
return result_text
else:
raise Exception(fāäøęÆę评类å{type(text)}ā)
def normalize_japanese_and_korean(text):
# 转ę¢äøŗå
Øč§å符
text = unicodedata.normalize(āNFKCā, text)
# å é¤ę ē¹ē¬¦å·å空ē½
text = re.sub(r'[^\w\s]', '', text)
text = re.sub(r'\s+', '', text)
return text
ęÆęč±čÆļ¼äøéØé对č±ęęę¬čæč”ę åå
def normalize_texts_english(texts: List[str]) -> List[str]:
# å建 EnglishTextNormalizer å®ä¾
normalizer = EnglishTextNormalizer()
# 对ęÆäøŖęę¬åŗēØę åå
normalized_texts = [normalizer(text) for text in texts]
return normalized_texts
ęÆęé¤äŗäøę„é©ä¹å¤ēå ¶ä»čÆčØļ¼ä½ęÆå¤ēč±čÆå»ŗč®®ä½æēØnormalize_texts
def normalize_texts_multi_language(texts: List[str]) -> List[str]:
# å建 EnglishTextNormalizer å®ä¾
normalizer = BasicTextNormalizer()
# 对ęÆäøŖęę¬åŗēØę åå
normalized_texts = [normalizer(text) for text in texts]
return normalized_texts
ęÆęäøę
def normalize_texts_chinese(texts):
# ę åå缩å
texts = [normalize_chinese(text) for text in texts] # å»ęę ē¹ē¬¦å·
return texts
ęÆęę„čÆćé©čÆ
def normalize_texts_japanese_korean(texts):
texts = [normalize_japanese_and_korean(text) for text in texts]
return texts