Featurization

Featurization methods are used to transform textual attributes of entities (e.g. file paths, process command lines, socket IP addresses and ports) into a vector. Some methods like word2vec and doc2vec learn this vector from the text corpus, while others like hierarchical_hashing compute the vector in a deterministic way.

Other methods like only_type and only_ones simply skip this embedding step and assign either a one-hot encoded type or ones to each entity. Those methods thus do not require any specific argument. In all methods, the resulting vectors are used as node features during the gnn_training task.

  • word2vec
    • alpha: float
    • window_size: int
    • min_count: int
    • use_skip_gram: bool
    • num_workers: int
    • epochs: int
    • compute_loss: bool
    • negative: int
    • decline_rate: int
  • doc2vec
    • include_neighbors: bool
    • epochs: int
    • alpha: float
  • fasttext
    • min_count: int
    • alpha: float
    • window_size: int
    • negative: int
    • num_workers: int
    • use_pretrained_fb_model: bool
  • alacarte
    • walk_length: int
    • num_walks: int
    • epochs: int
    • context_window_size: int
    • min_count: int
    • use_skip_gram: bool
    • num_workers: int
    • compute_loss: bool
    • add_paths: bool
  • temporal_rw
    • walk_length: int
    • num_walks: int
    • trw_workers: int
    • time_weight: str
    • half_life: int
    • window_size: int
    • min_count: int
    • use_skip_gram: bool
    • wv_workers: int
    • epochs: int
    • compute_loss: bool
    • negative: int
    • decline_rate: int
  • flash
    • min_count: int
    • workers: int
  • hierarchical_hashing
  • magic
  • only_type
  • only_ones