Configuration
Each model has a statically typed configuration model. Each configuration has default settings that will be instantiated when the model is instantiated. To create a default preprocessing configuration for example you would:
from everyvoice.config.preprocessing_config import PreprocessingConfig
preprocessing_config = PreprocessingConfig()
Static typing means that misconfiguration errors should occur as soon as the configuration is instantiated instead of producing downstream runtime errors. It also means that intellisense is available in your code editor when working with a configuration class.
Sharing Configurations
The Text and Preprocessing configurations should only be defined once per dataset and shared between your models to ensure each model makes the same assumptions about your data.
To achieve that, each model configuration can also be defined as a path to a configuration file. So, a configuration for a text-to-spec model that uses separately defined text and audio preprocessing configurations might look like this:
model:
decoder: ...
...
training:
batch_size: 16
ckpt_epochs: 1
...
path_to_preprocessing_config_file: "./config/default/everyvoice-shared-data.yaml"
path_to_text_config_file: "./config/default/everyvoice-shared-text.yaml"
Serialization
By default configuration objects are serialized as dictionaries, which works as expected with integers, floats, lists, booleans, dicts etc. But there are some cases where you need to specify a Callable in your configuration. For example the {ref}TextConfig has a cleaners field that takes a list of Callables to apply in order to raw text.
By default, these functions turn raw text to lowercase, collapse whitespace, and normalize using Unicode NFC normalization. In Python, we could instantiate this by passing the callables directly like so:
from everyvoice.config.text_config import TextConfig
from everyvoice.utils import collapse_whitespace, lower, nfc_normalize
text_config = TextConfig(cleaners=[lower, collapse_whitespace, nfc_normalize])
But, for yaml or json configuration, we need to serialize these functions. To do so, EveryVoice will turn each callable into module dot-notation. That is,
your configuration will look like this in yaml:
cleaners:
- everyvoice.utils.lower
- everyvoice.utils.collapse_whitespace
- everyvoice.utils.nfc_normalize
This will then be de-serialized upon instantiation of your configuration.
Text Configuration
The TextConfig is where you define the symbol set for your data and any cleaners used to clean your raw text into the text needed
for your data. You can share the TextConfig with any models that need it and only need one text configuration per dataset (and possibly only per language).
TextConfig
everyvoice.config.text_config.TextConfig
Bases: ConfigModel
Source code in everyvoice/config/text_config.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341 | class TextConfig(ConfigModel):
symbols: Symbols = Field(default_factory=Symbols)
to_replace: dict[str, str] = Field(
default={},
title="Global text replacements",
description="Map of match-to-replacement to apply on training and run-time text, before cleaners are applied. Superceded by language_to_replace when processing text in a language which has language-specific text replacements, which are in turn superceded by dataset_to_replace when processing a dataset which has dataset-specific text replacements.",
)
language_to_replace: dict[str, dict[str, str]] = Field(
default={},
title="Language-specific text replacements",
description="Map from language code to text replacement maps. Supercedes the global text replacements when defined for a given language. Superceded by dataset_to_replace when processing a dataset which has dataset-specific text replacements.",
)
dataset_to_replace: dict[str, dict[str, str]] = Field(
default={},
title="Dataset-specific text replacements.",
description="Map from dataset label to replacement maps. Supercedes both the global text replacements and language_to_replace when defined for a given dataset.",
)
cleaners: list[PossiblySerializedCallable] = Field(
default=DEFAULT_CLEANERS,
title="Global cleaners",
description="List of cleaners to apply to all datasets and run-time data. Superceded by language_cleaners when processing text in a language which has language-specific cleaners, which are in turn superceded by dataset_cleaners when processing a dataset which has dataset-specific cleaners.",
)
language_cleaners: dict[str, list[PossiblySerializedCallable]] = Field(
default={},
title="Language-specific cleaners",
description="Map from language code to cleaner lists. Supercedes the global cleaners when defined for a given language. Superceded by dataset_cleaners when processing a dataset which has dataset-specific cleaners.",
)
dataset_cleaners: dict[str, list[PossiblySerializedCallable]] = Field(
default={},
title="Dataset-specific cleaners",
description="Map from dataset label to cleaner lists. Supercedes both the global cleaners and language_cleaners when defined for a given dataset.",
)
g2p_engines: G2P_Engines = Field(
default={},
title="External G2P",
description="User defined or external G2P engines.\nSee https://github.com/EveryVoiceTTS/everyvoice_g2p_template_plugin to implement your own G2P.",
examples=["""{"fr": "everyvoice_plugin_g2p4example.g2p"}"""],
)
split_text: bool = Field(
default=True,
title="Split Text",
description="Whether or not to perform text splitting (also referred to as text chunking) at inference time. Instead of synthesizing an entire utterance, the utterance will be split into smaller chunks and re-combined after synthesis. This can lead to more natural synthesis for long-form (i.e. paragraph) synthesis.",
)
boundaries: dict[Language, LanguageBoundaries] = Field(
default={},
title="Boundaries",
description="Strong and Weak boundaries on which text splitting is to be performed, for every language.",
examples=["""{'eng': {'strong': '!?.', 'weak': ':;,'}}'"""],
)
def get_cleaners(
self, *, lang_id: str | None = None, dataset_label: str | None = None
) -> list[PossiblySerializedCallable]:
"""Get the cleaners to apply to a given dataset and language
Dataset has top precendence, then language, falling back to global cleaners
"""
if dataset_label is not None and dataset_label in self.dataset_cleaners:
return self.dataset_cleaners[dataset_label]
elif lang_id is not None and lang_id in self.language_cleaners:
return self.language_cleaners[lang_id]
else:
return self.cleaners
def get_to_replace(
self, *, lang_id: str | None = None, dataset_label: str | None = None
) -> dict[str, str]:
"""Get the to_replace filters to apply to a given dataset and language
Dataset has top precendence, then language, falling back to global cleaners
"""
if dataset_label is not None and dataset_label in self.dataset_to_replace:
return self.dataset_to_replace[dataset_label]
elif lang_id is not None and lang_id in self.language_to_replace:
return self.language_to_replace[lang_id]
else:
return self.to_replace
@model_validator(mode="after")
def clean_symbols(self) -> Self:
"""We should apply all cleaners to the symbols
Returns:
TextConfig: a text config with cleaned symbols
"""
for k, v in self.symbols:
if k not in ["punctuation", "silence"]:
dataset_label = get_label_from_symbol_key(k)
cleaners = self.get_cleaners(dataset_label=dataset_label)
to_replace = self.get_to_replace(dataset_label=dataset_label)
normalized = [normalize_text_helper(x, to_replace, cleaners) for x in v]
setattr(self.symbols, k, normalized)
if "" in normalized or len(normalized) != len(set(normalized)):
logger.warning(
f"Normalization created a duplicate or inserted '' in {k}={normalized}. "
"Please check your shared-text config for problems."
)
return self
@model_validator(mode="after")
def load_g2p_engines(self) -> Self:
"""
Given `g2p_engines`, populate the global list `AVAILABLE_G2P_ENGINES`.
"""
from everyvoice.text.phonemizer import AVAILABLE_G2P_ENGINES
for lang_id, name in self.g2p_engines.items():
g2p_func = load_custom_g2p_engine(lang_id, name)
if lang_id in AVAILABLE_G2P_ENGINES:
logger.warning(
f"Overriding g2p for `{lang_id}` with user provided g2p plugin `{name}`"
)
AVAILABLE_G2P_ENGINES[lang_id] = g2p_func
logger.info(f"Adding G2P engine from `{name}` for `{lang_id}`")
return self
@staticmethod
def load_config_from_path(path: Path) -> "TextConfig":
"""Load a config from a path"""
config = load_config_from_json_or_yaml_path(path)
with init_context({"config_path": path}):
config = TextConfig(**config)
return config
|
cleaners = Field(default=DEFAULT_CLEANERS, title='Global cleaners', description='List of cleaners to apply to all datasets and run-time data. Superceded by language_cleaners when processing text in a language which has language-specific cleaners, which are in turn superceded by dataset_cleaners when processing a dataset which has dataset-specific cleaners.')
class-attribute
instance-attribute
symbols = Field(default_factory=Symbols)
class-attribute
instance-attribute
to_replace = Field(default={}, title='Global text replacements', description='Map of match-to-replacement to apply on training and run-time text, before cleaners are applied. Superceded by language_to_replace when processing text in a language which has language-specific text replacements, which are in turn superceded by dataset_to_replace when processing a dataset which has dataset-specific text replacements.')
class-attribute
instance-attribute
Symbols
Your symbol set is created by taking the union of all values defined. For example:
symbols:
dataset_0_characters: ['a', 'b', 'c']
dataset_1_characters: ['b', 'c', 'd']
Will create a symbol set equal to {'a', 'b', 'c', 'd'} (i.e. the union of both key/values). This allows you to train models with data from different languages, for example.
Important
You should always manually inspect your configuration here to make sure it makes sense with respect to your data. Is there a symbol that shouldn't be there? Is there a symbol that's defined as 'punctuation' but is used as non-punctuation in your language? Please inspect these and update the configuration accordingly.
everyvoice.config.text_config.Symbols
Bases: BaseModel
Source code in everyvoice/config/text_config.py
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125 | class Symbols(BaseModel):
silence: list[str] = Field(
default=["<SIL>"], description="The symbol(s) used to indicate silence."
)
punctuation: Punctuation = Field(
default_factory=Punctuation,
description="EveryVoice will combine punctuation and normalize it into a set of five permissible types of punctuation to help tractable training.",
)
model_config = ConfigDict(extra="allow")
@property
def all_except_punctuation(self) -> set[str]:
"""Returns the set containing all characters."""
return set(w for _, v in self if not isinstance(v, Punctuation) for w in v)
@model_validator(mode="after")
def cannot_have_punctuation_in_symbol_set(self) -> "Symbols":
"""You cannot have the same symbol defined in punctuation as elsewhere.
Raises:
ValueError: raised if a symbol from punctuation is found elsewhere
Returns:
Symbols: The validated symbol set
"""
for punctuation in self.punctuation.all:
if punctuation in self.all_except_punctuation:
raise ValueError(
f"Sorry, the symbol '{punctuation}' occurs in both your declared punctuation and in your other symbol set. Please inspect your text configuration and either remove the symbol from the punctuation or other symbol set."
)
return self
@model_validator(mode="after")
def member_must_be_list_of_strings(self) -> "Symbols":
"""Except for `punctuation` & `pad`, all user defined member variables
have to be a list of strings.
"""
for k, v in self:
if isinstance(v, Punctuation):
continue
if k == "pad":
continue
if not isinstance(v, list) or not all(isinstance(e, str) for e in v):
raise ValueError(f"{k} must be a list")
return self
|
all_except_punctuation
property
Returns the set containing all characters.
cannot_have_punctuation_in_symbol_set()
You cannot have the same symbol defined in punctuation as elsewhere.
Raises:
| Type |
Description |
ValueError
|
raised if a symbol from punctuation is found elsewhere
|
Returns:
| Name | Type |
Description |
Symbols |
Symbols
|
|
Source code in everyvoice/config/text_config.py
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110 | @model_validator(mode="after")
def cannot_have_punctuation_in_symbol_set(self) -> "Symbols":
"""You cannot have the same symbol defined in punctuation as elsewhere.
Raises:
ValueError: raised if a symbol from punctuation is found elsewhere
Returns:
Symbols: The validated symbol set
"""
for punctuation in self.punctuation.all:
if punctuation in self.all_except_punctuation:
raise ValueError(
f"Sorry, the symbol '{punctuation}' occurs in both your declared punctuation and in your other symbol set. Please inspect your text configuration and either remove the symbol from the punctuation or other symbol set."
)
return self
|
member_must_be_list_of_strings()
Except for punctuation & pad, all user defined member variables
have to be a list of strings.
Source code in everyvoice/config/text_config.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125 | @model_validator(mode="after")
def member_must_be_list_of_strings(self) -> "Symbols":
"""Except for `punctuation` & `pad`, all user defined member variables
have to be a list of strings.
"""
for k, v in self:
if isinstance(v, Punctuation):
continue
if k == "pad":
continue
if not isinstance(v, list) or not all(isinstance(e, str) for e in v):
raise ValueError(f"{k} must be a list")
return self
|