ACDC
auto_circuit.prune_algos.ACDC
Attributes
Classes
Functions
acdc_prune_scores
acdc_prune_scores(model: PatchableModel, dataloader: PromptDataLoader, official_edges: Optional[Set[Edge]], tao_exps: List[int] = list(range(-5, -1)), tao_bases: List[int] = [1, 3, 5, 7, 9], faithfulness_target: Literal['kl_div', 'mse'] = 'kl_div', test_mode: bool = False, run_circuits_ref: Optional[Callable[..., CircuitOutputs]] = None, show_graphs: bool = False, draw_seq_graph_ref: Optional[Callable[..., Figure]] = None) -> PruneScores
Run the ACDC algorithm from the paper "Towards Automated Circuit Discovery for Mechanistic Interpretability" (Conmy et al. (2023)).
The algorithm does not assign scores to each edge, instead it finds the edges to be
pruned given a certain value of tao. So we run the algorithm for several values of
tao (each combination of tao_exps
and tao_bases
) and give equal scores to all
edges that are pruned for a given tao. Then we use test_edge_counts to pass edge
counts to run_circuits such that all edges with the same score are pruned together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
PatchableModel
|
The model to find the circuit for. |
required |
dataloader |
PromptDataLoader
|
The dataloader to use for input and patches. |
required |
official_edges |
Optional[Set[Edge]]
|
Not used. |
required |
tao_exps |
List[int]
|
The exponents to use for the set of tao values. |
list(range(-5, -1))
|
tao_bases |
List[int]
|
The bases to use for the set of tao values. |
[1, 3, 5, 7, 9]
|
faithfulness_target |
Literal['kl_div', 'mse']
|
The faithfulness metric to optimize the circuit for. |
'kl_div'
|
test_mode |
bool
|
Run the model in test mode. This mode computes the output of each
ablation again using the slower
|
False
|
run_circuits_ref |
Optional[Callable[..., CircuitOutputs]]
|
Reference to the function
|
None
|
show_graphs |
bool
|
Whether to visualize the model activations during training using
|
False
|
draw_seq_graph_ref |
Optional[Callable[..., Figure]]
|
Reference to the function
|
None
|
Returns:
Type | Description |
---|---|
PruneScores
|
An ordering of the edges by importance to the task. Importance is equal to the absolute value of the score assigned to the edge. |
Note
Only the first batch of the dataloader
is used.
Source code in auto_circuit/prune_algos/ACDC.py
|
|