Hyperparameter Tuning¶
PIDSMaker simplifies hyperparameter tuning by combining its efficient pipeline design with the power of W&B Sweeps.
Dataset-specific tuning¶
Tuning is configured using YAML files, just like system definitions. For example, suppose you've created a new system named my_system
, and its configuration is stored in config/my_system.yml
. To search for optimal hyperparameters on the THEIA_E3
dataset, you can create a new tuning configuration file at config/experiments/tuning/systems/theia_e3/tuning_my_system.yml
following the W&B syntax:
tuning_my_system.yml | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
- Other hyperparameter search strategies like
random
andbayesian
can be used (more info here).
Before starting the hyperparameter search, make sure you are logged in to W&B by running: wandb login
inside the container’s shell. Then, launch the hyperparameter tuning with: --tuning_mode=hyperparameters
:
./run.sh my_system CADETS_E3 --tuning_mode=hyperparameters
This flag will automatically load the tuning configuration from config/experiments/tuning/systems/theia_e3/tuning_my_system.yml
, based on the provided dataset and system names. Any overlapping arguments defined in config/my_system.yml
will be overridden by those specified in the tuning file. If the specified tuning file does not exist (i.e., no dataset-specific configuration is available), the pipeline falls back to the default tuning configuration: config/experiments/tuning/systems/default/tuning_default_baselines.yml
.
Note
You can also pass arguments directly via the CLI. CLI arguments always take precedence and will override both the system configuration (config/my_system.yml
) and the tuning configuration (tuning_my_system.yml
).
Cross-dataset tuning¶
If you wish to use the same hyperparameter tuning configuration across all datasets, you can explicitly specify the tuning file as a command-line argument.
For instance, you might create a tuning file named config/experiments/tuning/systems/default/tuning_my_system_all_datasets.yml
, then apply it to all dataset by running:
./run_all_datasets.py my_system \
--tuning_mode=hyperparameters \
--tuning_file_path=systems/default/tuning_my_system_all_datasets
or on a single dataset:
./run.sh my_system CLEARSCOPE_E3 \
--tuning_mode=hyperparameters \
--tuning_file_path=systems/default/tuning_my_system_all_datasets
The path passed to --tuning_file_path
should start from systems/
.
Tips
For a better historization of experiments, we recommend to assign a name to each sweep (--exp
) and a dedicated project name (--project
):
./run.sh my_system CADETS_E3 \
--tuning_mode=hyperparameters \
--exp=bench_fasttext_word2vec \
--project=best_featurization_method
Best model selection¶
Once the sweep has finished, the best run can be obtained from W&B by sorting based on your desired metric.
Get the hyperparameters associated with the best run and put them into a config/tuned_baselines/{dataset}/tuned_my_system.yml
.
tuned_my_system.yml | |
---|---|
1 2 3 4 5 6 7 8 |
|
Each system should have a tuned file per dataset, or the best hyperparameters can be directly set in its my_system.yml
file if it uses the same hyperparameters in all datasets.
Note
The process of best hyperparameter selection is currently done by hand but will likely be automated in future.
Run a tuned system¶
Running a system with its best hyperparameters on a particular dataset is as using the --tuned
arg:
./run.sh my_system CADETS_E3 --tuned
This will search for the config/tuned_baselines/cadets_e3/tuned_my_system.yml
file and override the default system config by the best hyperparameters.