Create attack flow (#118)

* implement content-check for muchdogesec/obstracts#131 * --- * version bump * initial --create_attack_flow impl #111 * add boolean to return value * add `incident_classification` to content-check #131 * renaming content check * bump stix2extension version * bumping requirements * Update cases-standard-tests.md * adding better demos * changing flag name * updating tests * add flow objects to main bundle #120 * tuning some extractions --------- Co-authored-by: David G <[email protected]>
muchdogesec · Feb 13, 2025 · 6f92066 · 6f92066
1 parent 747ef1e
commit 6f92066
Show file tree

Hide file tree

Showing 15 changed files with 416 additions and 153 deletions.
diff --git a/README.md b/README.md
@@ -90,7 +90,7 @@ The following arguments are available:
 
 How the extractions are performed
 
-* `--use_extractions` (REQUIRED): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `pattern_*` would run all extractions starting with `pattern_`)
+* `--use_extractions` (REQUIRED): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `'pattern_*'` would run all extractions starting with `pattern_` -- make sure to use quotes when using a wildcard)
 	* Important: if using any AI extractions (`ai_*`), you must set an AI API key in your `.env` file
 	* Important: if you are using any MITRE ATT&CK, CAPEC, CWE, ATLAS or Location extractions you must set `CTIBUTLER` or NVD CPE or CVE extractions you must set `VULMATCH` settings in your `.env` file
 * `--relationship_mode` (REQUIRED): either.
@@ -110,11 +110,13 @@ If any AI extractions, or AI relationship mode is set, you must set the followin
 		* Provider (env var required `OPENAI_API_KEY`): `openai:`, models e.g.: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4` ([More here](https://platform.openai.com/docs/models))
 		* Provider (env var required `ANTHROPIC_API_KEY`): `anthropic:`, models e.g.: `claude-3-5-sonnet-latest`, `claude-3-5-haiku-latest`, `claude-3-opus-latest` ([More here](https://docs.anthropic.com/en/docs/about-claude/models))
 		* Provider (env var required `GOOGLE_API_KEY`): `gemini:models/`, models: `gemini-1.5-pro-latest`, `gemini-1.5-flash-latest` ([More here](https://ai.google.dev/gemini-api/docs/models/gemini))
-		* Provider (env var required `DEEPSEEK_API_KEY`): `deepseek:`, models `deepseek-chat` ([More here](https://api-docs.deepseek.com/quick_start/pricing))		
+		* Provider (env var required `DEEPSEEK_API_KEY`): `deepseek:`, models `deepseek-chat` ([More here](https://api-docs.deepseek.com/quick_start/pricing))
 	* See `tests/manual-tests/cases-ai-extraction-type.md` for some examples
 * `--ai_settings_relationships`:
 	* similar to `ai_settings_extractions` but defines the model used to generate relationships. Only one model can be provided. Passed in same format as `ai_settings_extractions`
 	* See `tests/manual-tests/cases-ai-relationships.md` for some examples
+* `--ai_check_content`: Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about. For context, we use this to filter out non-threat intel posts in Obstracts and Stixify. You pass `provider:model` with this flag to determine the AI model you wish to use to perform the check.
+* `--ai_create_attack_flow`: passing this flag will also prompt the AI model (the same entered for `--ai_settings_relationships`) to generate an [Attack Flow](https://center-for-threat-informed-defense.github.io/attack-flow/) for the MITRE ATT&CK extractions to define the logical order in which they are being described. You must pass `--ai_settings_relationships` for this to work.
 
 ## Adding new extractions
 

diff --git a/includes/extractions/ai/config.yaml b/includes/extractions/ai/config.yaml
@@ -725,7 +725,7 @@ ai_mitre_attack_enterprise:
   version: 1.0.0
   prompt_base: 'Extract all references to MITRE ATT&CK Enterprise tactics, techniques, groups, data sources, mitigations, software, and campaigns described in the text. These references may not be explicit in the text so you should be careful to account for the natural language of the text your analysis. Do not include MITRE ATT&CK ICS or MITRE ATT&CK Mobile in the results.'
   prompt_helper: 'If you are unsure, you can learn more about MITRE ATT&CK Enterprise here: https://attack.mitre.org/matrices/enterprise/'
-  prompt_conversion: 'Convert all extractions into the corresponding ATT&CK ID.'
+  prompt_conversion: 'You should respond with only the ATT&CK ID.'
   test_cases: ai_mitre_attack_enterprise
   stix_mapping: ctibutler-mitre-attack-enterprise-id
 
@@ -740,7 +740,7 @@ ai_mitre_attack_mobile:
   version: 1.0.0
   prompt_base: 'Extract all references to MITRE ATT&CK Mobile tactics, techniques, groups, data sources, mitigations, software, and campaigns described in the text. These references may not be explicit in the text so you should be careful to account for the natural language of the text your analysis. Do not include MITRE ATT&CK ICS or MITRE ATT&CK Enterprise in the results.'
   prompt_helper: 'If you are unsure, you can learn more about MITRE ATT&CK Enterprise here: https://attack.mitre.org/matrices/mobile/'
-  prompt_conversion: 'Convert all extractions into the corresponding ATT&CK ID.'
+  prompt_conversion: 'You should respond with only the ATT&CK ID.'
   test_cases: ai_mitre_attack_mobile
   stix_mapping: ctibutler-mitre-attack-mobile-id
 
@@ -755,7 +755,7 @@ ai_mitre_attack_ics:
   version: 1.0.0
   prompt_base: 'Extract all references to MITRE ATT&CK ICS tactics, techniques, groups, data sources, mitigations, software, and campaigns described in the text. These references may not be explicit in the text so you should be careful to account for the natural language of the text your analysis. Do not include MITRE ATT&CK Mobile or MITRE ATT&CK Enterprise in the results.'
   prompt_helper: 'If you are unsure, you can learn more about MITRE ATT&CK Enterprise here: https://attack.mitre.org/matrices/ics/'
-  prompt_conversion: 'Convert all extractions into the corresponding ATT&CK ID.'
+  prompt_conversion: 'You should respond with only the ATT&CK ID.'
   test_cases: ai_mitre_attack_ics
   stix_mapping: ctibutler-mitre-attack-ics-id
 
@@ -772,7 +772,7 @@ ai_mitre_capec:
   version: 1.0.0
   prompt_base: 'Extract all references to a MITRE CAPEC object from the text.'
   prompt_helper: 'If you are unsure, you can learn more about MITRE CAPEC here: https://capec.mitre.org/'
-  prompt_conversion: 'Convert all extractions into the corresponding CAPEC ID in the format `CAPEC-ID`'
+  prompt_conversion: 'You should respond with only the CAPEC ID.'
   test_cases: ai_mitre_capec
   stix_mapping: ctibutler-mitre-capec-id
 
@@ -789,7 +789,7 @@ ai_mitre_cwe:
   version: 1.0.0
   prompt_base: 'Extract all references to a MITRE CWE object from the text.'
   prompt_helper: 'If you are unsure, you can learn more about MITRE CAPEC here: https://cwe.mitre.org/'
-  prompt_conversion: 'Convert all extractions into the corresponding CWE ID in the format `CWE-ID`'
+  prompt_conversion: 'You should respond with only the CWE ID.'
   test_cases: ai_mitre_cwe
   stix_mapping: ctibutler-mitre-cwe-id
 

diff --git a/includes/tests/test_cases.yaml b/includes/tests/test_cases.yaml
@@ -492,8 +492,8 @@ ai_mitre_attack_enterprise:
     - 'T1053.005' # attack-pattern--005a06c6-14bf-4118-afa0-ebcd8aebb0c9
     - 'T1040' # attack-pattern--3257eb21-f9a7-4430-8de1-d8b6e288f529 , course-of-action--46b7ef91-4e1d-43c5-a2eb-00fa9444f6f4
     - 'TA0003' # x-mitre-tactic--5bc1d813-693e-4823-9961-abf9af4b0e92
-    - 'Rundll32' # attack-pattern--045d0922-2310-4e60-b5e4-3302302cb3c5
-    - 'OS Credential Dumping' # attack-pattern--0a3ead4e-6d47-4ccb-854c-a6a4f9d96b22
+    # hidden as causes ai to get confused - 'Rundll32' # attack-pattern--045d0922-2310-4e60-b5e4-3302302cb3c5
+    # hidden as causes ai to get confused - 'OS Credential Dumping' # attack-pattern--0a3ead4e-6d47-4ccb-854c-a6a4f9d96b22
   test_negative_examples:
     - 
 
@@ -520,8 +520,8 @@ ai_mitre_attack_mobile:
     - 'S0505' # malware--3271c107-92c4-442e-9506-e76d62230ee8
     - 'T1630.001' # attack-pattern--0cdd66ad-26ac-4338-a764-4972a1e17ee3
     - 'TA0029' # x-mitre-tactic--3e962de5-3280-43b7-bc10-334fbc1d6fa8
-    - 'Impair Defenses' # attack-pattern--20b0931a-8952-42ca-975f-775bad295f1a
-    - 'Call Log' # attack-pattern--1d1b1558-c833-482e-aabb-d07ef6eae63d
+    # hidden as causes ai to get confused - 'Impair Defenses' # attack-pattern--20b0931a-8952-42ca-975f-775bad295f1a
+    # hidden as causes ai to get confused - 'Call Log' # attack-pattern--1d1b1558-c833-482e-aabb-d07ef6eae63d
   test_negative_examples:
     - 
 
@@ -541,8 +541,8 @@ generic_mitre_attack_ics_name:
 ai_mitre_attack_ics:
   test_positive_examples:
     - 'TA0111' # x-mitre-tactic--33752ae7-f875-4f43-bdb6-d8d02d341046
-    - 'Scripting' # attack-pattern--2dc2b567-8821-49f9-9045-8740f3d0b958
-    - 'Program Upload' # attack-pattern--3067b85e-271e-4bc5-81ad-ab1a81d411e3
+    # hidden as causes ai to get confused - 'Scripting' # attack-pattern--2dc2b567-8821-49f9-9045-8740f3d0b958
+    # hidden as causes ai to get confused - 'Program Upload' # attack-pattern--3067b85e-271e-4bc5-81ad-ab1a81d411e3
   test_negative_examples:
 
 ####### MITRE CAPEC #######
@@ -567,8 +567,8 @@ generic_mitre_capec_name:
 ai_mitre_capec:
   test_positive_examples:
     - 'CAPEC-110' # attack-pattern--7c90bef7-530c-427b-8fb7-f9d3eda9c26a
-    - 'Clickjacking' # attack-pattern--ec41b2b3-a3b6-4af0-be65-69e82907dfef
-    - 'Overflow Buffers' # attack-pattern--77e51461-7843-411c-a90e-852498957f76
+    # hidden as causes ai to get confused - 'Clickjacking' # attack-pattern--ec41b2b3-a3b6-4af0-be65-69e82907dfef
+    # hidden as causes ai to get confused - 'Overflow Buffers' # attack-pattern--77e51461-7843-411c-a90e-852498957f76
   test_negative_examples:
     - 
 
@@ -596,8 +596,8 @@ ai_mitre_cwe:
   test_positive_examples:
     - 'CWE-1023' # weakness--c122031a-5735-54f2-a80b-194da3a2c0e6
     - 'CWE-102' # weakness--ad5b3e38-fdf2-5c97-90da-30dad0f1f016
-    - 'Use of Redundant Code' # weakness--6dfb4e56-706d-5243-a3eb-6d4e49b16389
-    - 'Insufficient Encapsulation' # weakness--b0a3b7a9-fefa-5435-8336-4d2e019597f8
+    # hidden as causes ai to get confused - 'Use of Redundant Code' # weakness--6dfb4e56-706d-5243-a3eb-6d4e49b16389
+    # hidden as causes ai to get confused - 'Insufficient Encapsulation' # weakness--b0a3b7a9-fefa-5435-8336-4d2e019597f8
   test_negative_examples:
 
 ####### MITRE ATLAS #######

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "txt2stix"
-version = "0.0.1b5"
+version = "0.0.2"
 authors = [{ name = "DOGESEC", email = "[email protected]" }]
 description = "txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle."
 readme = "README.md"
@@ -23,7 +23,7 @@ dependencies = [
   "requests==2.32.3",
   "python-dotenv>=1.0.1",
   "schwifty>=2024.6.1",
-  "stix2extensions @ https://github.com/muchdogesec/stix2extensions/archive/main.zip",
+  "stix2extensions @ https://github.com/muchdogesec/stix2extensions/releases/download/main-2025-02-12-06-23-37/stix2extensions-0.0.3-py3-none-any.whl",
   "tld>=0.13",
   "tldextract>=5.1.2",
   "validators>=0.28.3",

diff --git a/requirements.txt b/requirements.txt
@@ -51,7 +51,6 @@ sniffio==1.3.1; python_version >= '3.7'
 sqlalchemy==2.0.30; python_version >= '3.7'
 stix2==3.0.1; python_version >= '3.6'
 stix2-patterns==2.0.0; python_version >= '3.6'
-https://github.com/muchdogesec/stix2extensions/archive/main.zip
 tenacity==8.3.0; python_version >= '3.8'
 tiktoken==0.7.0; python_version >= '3.8'
 tld==0.13; python_version >= '3.7' and python_version < '4'
@@ -63,4 +62,5 @@ validators==0.28.3; python_version >= '3.8'
 yarl==1.9.4; python_version >= '3.7'
 zipp==3.19.1; python_version >= '3.8'
 llama-index==0.10.51; python_version >= '3.8'
-base58>=2.1.1; python_version >= '3.8'
+base58>=2.1.1; python_version >= '3.8'
+stix2extensions @ https://github.com/muchdogesec/stix2extensions/releases/download/main-2025-02-12-06-23-37/stix2extensions-0.0.3-py3-none-any.whl
diff --git a/tests/data/manually_generated_reports/attack_flow_demo.txt b/tests/data/manually_generated_reports/attack_flow_demo.txt
@@ -0,0 +1,7 @@
+Victims receive spear phishing emails with from [email protected] malicious zip files attached named badfile.zip
+
+Due to password protection, the zip files are able to bypass some AV detections.
+
+The zip files are extracted and usually contain a malicious document, such as a .doc, .pdf, or .xls. Some examples are malware.pdf and bad.com
+
+The extracted files contain malicious macros that connect to a C2 server 1.1.1.1
diff --git a/tests/manual-tests/cases-standard-tests.md b/tests/manual-tests/cases-standard-tests.md
@@ -362,4 +362,50 @@ python3 txt2stix.py \
 	--confidence 100 \
 	--use_extractions lookup_disarm_name \
 	--report_id 8cb2dbf0-136f-4ecb-995c-095496e22abc
+```
+
+### ai check content
+
+```shell
+python3 txt2stix.py \
+    --relationship_mode standard \
+    --input_file tests/data/extraction_types/all_cases.txt \
+    --name 'Test AI Content check' \
+    --tlp_level clear \
+    --confidence 100 \
+    --use_extractions 'pattern_*' \
+    --ai_content_check openai:gpt-4o \
+    --report_id 4fa18f2d-278b-4fd4-8470-62a8807d35ad
+```
+
+### attack flow demo
+
+no indicators
+
+```shell
+python3 txt2stix.py \
+    --relationship_mode standard \
+    --ai_settings_relationships openai:gpt-4o \
+    --input_file tests/data/manually_generated_reports/attack_flow_demo.txt \
+    --name 'Test MITRE ATT&CK Flow demo' \
+    --tlp_level clear \
+    --confidence 100 \
+    --use_extractions 'ai_mitre_attack_enterprise' \
+    --ai_create_attack_flow \
+    --report_id c0fef67c-720b-4184-a62e-ea465b4d89b5
+```
+
+with indicators
+
+```shell
+python3 txt2stix.py \
+    --relationship_mode standard \
+    --ai_settings_relationships openai:gpt-4o \
+    --input_file tests/data/manually_generated_reports/attack_flow_demo.txt \
+    --name 'Test MITRE ATT&CK Flow demo with iocs' \
+    --tlp_level clear \
+    --confidence 100 \
+    --use_extractions ai_mitre_attack_enterprise,'pattern_*' \
+    --ai_create_attack_flow \
+    --report_id 3b160a8d-12dd-4e7c-aee8-5af6e371b425
 ```