vllm.tokenizers ¶
Modules:
| Name | Description |
|---|---|
deepseek_v32_encoding | |
deepseekv32 | |
detokenizer_utils | |
hf | |
mistral | |
protocol | |
registry | |
TokenizerRegistry module-attribute ¶
TokenizerRegistry = _TokenizerRegistry(
{
mode: (f"vllm.tokenizers.{mod_relname}", cls_name)
for mode, (mod_relname, cls_name) in (items())
}
)
__all__ module-attribute ¶
__all__ = [
"TokenizerLike",
"TokenizerRegistry",
"cached_get_tokenizer",
"get_tokenizer",
"cached_tokenizer_from_config",
"init_tokenizer_from_config",
]
TokenizerLike ¶
Bases: Protocol
Source code in vllm/tokenizers/protocol.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
__call__ ¶
apply_chat_template ¶
convert_ids_to_tokens ¶
convert_tokens_to_string ¶
decode ¶
encode ¶
from_pretrained classmethod ¶
from_pretrained(
path_or_repo_id: str | Path,
*args,
trust_remote_code: bool = False,
revision: str | None = None,
download_dir: str | None = None,
**kwargs,
) -> TokenizerLike
Source code in vllm/tokenizers/protocol.py
get_added_vocab ¶
get_vocab ¶
cached_tokenizer_from_config ¶
cached_tokenizer_from_config(
model_config: ModelConfig, **kwargs
)
Source code in vllm/tokenizers/registry.py
get_tokenizer ¶
get_tokenizer(
tokenizer_name: str | Path,
*args,
tokenizer_cls: type[_T] = TokenizerLike,
trust_remote_code: bool = False,
revision: str | None = None,
download_dir: str | None = None,
**kwargs,
) -> _T
Gets a tokenizer for the given model name via HuggingFace or ModelScope.
Source code in vllm/tokenizers/registry.py
init_tokenizer_from_config ¶
init_tokenizer_from_config(model_config: ModelConfig)