Featurization
Featurization methods are used to transform textual attributes of entities (e.g. file paths, process command lines, socket IP addresses and ports) into a vector.
Some methods like word2vec
and doc2vec
learn this vector from the text corpus, while others like hierarchical_hashing
compute the vector in a deterministic way.
Other methods like only_type
and only_ones
simply skip this embedding step and assign either a one-hot encoded type or ones to each entity. Those methods thus do not require any specific argument.
In all methods, the resulting vectors are used as node features during the gnn_training
task.
- word2vec
- alpha: float
- window_size: int
- min_count: int
- use_skip_gram: bool
- num_workers: int
- epochs: int
- compute_loss: bool
- negative: int
- decline_rate: int
- doc2vec
- include_neighbors: bool
- epochs: int
- alpha: float
- fasttext
- min_count: int
- alpha: float
- window_size: int
- negative: int
- num_workers: int
- use_pretrained_fb_model: bool
- alacarte
- walk_length: int
- num_walks: int
- epochs: int
- context_window_size: int
- min_count: int
- use_skip_gram: bool
- num_workers: int
- compute_loss: bool
- add_paths: bool
- temporal_rw
- walk_length: int
- num_walks: int
- trw_workers: int
- time_weight: str
- half_life: int
- window_size: int
- min_count: int
- use_skip_gram: bool
- wv_workers: int
- epochs: int
- compute_loss: bool
- negative: int
- decline_rate: int
- flash
- min_count: int
- workers: int
- hierarchical_hashing
- magic
- only_type
- only_ones