Reverse official
auto_circuit.metrics.official_circuits.circuits.tracr.reverse_official
Classes
Functions
tracr_reverse_acdc_edges
tracr_reverse_acdc_edges(model: PatchableModel, token_positions: bool = False, word_idxs: Dict[str, int] = {}, seq_start_idx: int = 0) -> Set[Edge]
The canonical circuit for tracr-reverse according to Conmy et al. (2023). Based on the ACDC repo.
As discussed in Miller et al. (forthcoming), this circuit is (intended to be) the set of edges that must be preserved when Zero Ablation is used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
PatchableModel
|
A patchable TransformerLens tracr-reverse |
required |
token_positions |
bool
|
Whether to distinguish between token positions when returning
the set of circuit edges. If |
False
|
word_idxs |
Dict[str, int]
|
A dictionary defining the index of specific named tokens in the circuit definition. This variable is not used in this circuit, instead we assume a sequence of length 6 (including BOS). |
{}
|
seq_start_idx |
int
|
Offset to add to all of the token positions in |
0
|
Returns:
Type | Description |
---|---|
Set[Edge]
|
The set of edges in the circuit. |
Note
The sequence positions assume prompts of length 6 (including BOS), as in tracr/tracr_reverse_len_5_prompts.json
Source code in auto_circuit/metrics/official_circuits/circuits/tracr/reverse_official.py
tracr_reverse_true_edges
tracr_reverse_true_edges(model: PatchableModel, token_positions: bool = False, word_idxs: Dict[str, int] = {}, seq_start_idx: int = 0) -> Set[Edge]
The canonical circuit for tracr-reverse according to Miller et al. (Forthcoming). As discussed in the paper, this circuit is the set of edges that must be preserved when Resample Ablation is used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
PatchableModel
|
A patchable TransformerLens tracr-reverse |
required |
token_positions |
bool
|
Whether to distinguish between token positions when returning
the set of circuit edges. If |
False
|
word_idxs |
Dict[str, int]
|
A dictionary defining the index of specific named tokens in the circuit definition. This variable is not used in this circuit, instead we assume a sequence of length 6 (including BOS). |
{}
|
seq_start_idx |
int
|
Offset to add to all of the token positions in |
0
|
Returns:
Type | Description |
---|---|
Set[Edge]
|
The set of edges in the circuit. |
Note
The sequence positions assume prompts of length 6 (including BOS), as in tracr/tracr_reverse_len_5_prompts.json