vllm.v1.worker.workspace ¶
WorkspaceManager ¶
Manager for workspace allocation.
Manages workspace buffers for DBO (Dual Batch Overlap) execution. Can be locked to prevent further growth during execution.
Source code in vllm/v1/worker/workspace.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
_num_ubatches instance-attribute ¶
__init__ ¶
Source code in vllm/v1/worker/workspace.py
_ensure_workspace_size ¶
Ensure workspace is allocated and large enough, return current workspace.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
required_bytes | int | The number of bytes required. | required |
Returns:
| Type | Description |
|---|---|
Tensor | The current workspace tensor. |
Source code in vllm/v1/worker/workspace.py
_workspace_size_bytes staticmethod ¶
get_simultaneous ¶
Get multiple workspace tensors simultaneously from a single allocation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*shapes_and_dtypes | tuple[tuple[int, ...], dtype] | One or more (shape, dtype) tuples. | () |
Returns:
| Type | Description |
|---|---|
list[Tensor] | List of tensor views into the workspace buffer, one per shape/dtype pair. |
Source code in vllm/v1/worker/workspace.py
lock ¶
Lock the workspace to prevent further growth.
After locking, any attempt to allocate a larger workspace will raise an assertion error. This ensures workspace size is fixed during execution.
Source code in vllm/v1/worker/workspace.py
_compute_bytes ¶
current_workspace_manager ¶
current_workspace_manager() -> WorkspaceManager
Get the current workspace manager instance.
Raises:
| Type | Description |
|---|---|
AssertionError | If workspace manager has not been initialized. |
Source code in vllm/v1/worker/workspace.py
init_workspace_manager ¶
Initialize the workspace manager with a device.
Must be called before using any workspace functions. Typically called from GPUModelRunner.init.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device | device | The device to allocate workspace on. | required |
num_ubatches | int | None | Number of micro-batches. Defaults to 1. | None |
Source code in vllm/v1/worker/workspace.py
is_workspace_manager_initialized ¶
is_workspace_manager_initialized() -> bool
Check if workspace manager has been initialized.
Returns:
| Type | Description |
|---|---|
bool | True if workspace manager is initialized, False otherwise. |
lock_workspace ¶
Lock the workspace to prevent further growth.
After calling this function, any attempt to allocate a workspace larger than the current size will raise an AssertionError. This ensures that workspace size is fixed during execution and prevents unexpected memory allocations in the hot path.
Example
During initialization¶
init_workspace_manager(device) reserve_workspace(shape1, dtype1) reserve_workspace(shape2, dtype2)
Lock after warmup/profiling¶
lock_workspace()
Now all get_workspace calls must fit in pre-allocated size¶
Source code in vllm/v1/worker/workspace.py
reset_workspace_manager ¶
Reset the workspace manager to uninitialized state.
This is primarily intended for testing purposes to allow tests to reinitialize the workspace manager cleanly.