ACDC
auto_circuit.prune_algos.ACDC
Attributes
Classes
Functions
acdc_prune_scores
acdc_prune_scores(model: PatchableModel, dataloader: PromptDataLoader, official_edges: Optional[Set[Edge]], tao_exps: List[int] = list(range(-5, -1)), tao_bases: List[int] = [1, 3, 5, 7, 9], faithfulness_target: Literal['kl_div', 'mse'] = 'kl_div', test_mode: bool = False, run_circuits_ref: Optional[Callable[..., CircuitOutputs]] = None, show_graphs: bool = False, draw_seq_graph_ref: Optional[Callable[..., Figure]] = None) -> PruneScores
Run the ACDC algorithm from the paper "Towards Automated Circuit Discovery for Mechanistic Interpretability" (Conmy et al. (2023)).
The algorithm does not assign scores to each edge, instead it finds the edges to be
pruned given a certain value of tao. So we run the algorithm for several values of
tao (each combination of tao_exps
and tao_bases
) and give equal scores to all
edges that are pruned for a given tao. Then we use test_edge_counts to pass edge
counts to run_circuits such that all edges with the same score are pruned together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
PatchableModel
|
The model to find the circuit for. |
required |
dataloader |
PromptDataLoader
|
The dataloader to use for input and patches. |
required |
official_edges |
Optional[Set[Edge]]
|
Not used. |
required |
tao_exps |
List[int]
|
The exponents to use for the set of tao values. |
list(range(-5, -1))
|
tao_bases |
List[int]
|
The bases to use for the set of tao values. |
[1, 3, 5, 7, 9]
|
faithfulness_target |
Literal['kl_div', 'mse']
|
The faithfulness metric to optimize the circuit for. |
'kl_div'
|
test_mode |
bool
|
Run the model in test mode. This mode computes the output of each
ablation again using the slower
|
False
|
run_circuits_ref |
Optional[Callable[..., CircuitOutputs]]
|
Reference to the function
|
None
|
show_graphs |
bool
|
Whether to visualize the model activations during training using
|
False
|
draw_seq_graph_ref |
Optional[Callable[..., Figure]]
|
Reference to the function
|
None
|
Returns:
Type | Description |
---|---|
PruneScores
|
An ordering of the edges by importance to the task. Importance is equal to the absolute value of the score assigned to the edge. |
Note
Only the first batch of the dataloader
is used.
Source code in auto_circuit/prune_algos/ACDC.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|