-
-

Log File Analysis

+
+

Python Log File Analysis

Logs contain very detailed information about events happening on computers. And the extra details that they provide come with additional complexity that we need to handle ourselves. A pageview may contain many log lines, and a @@ -1643,42 +1643,42 @@

Log File Analysis - Data Preparation

referer_path

referer_query

referer_fragment

-

referer_hostname

-

referer_port

referer_dir_1

referer_dir_2

-

referer_dir_3

referer_last_dir

+

referer_hostname

+

referer_port

+

referer_dir_3

0

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

1

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

2

http://adver.tools/

@@ -1687,82 +1687,82 @@

Log File Analysis - Data Preparation

3

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

4

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

5

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

6

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

7

-

-

+

-

-

+

+

+ +

+

nan

-

nan

-

-

-

nan

-

nan

-

-

+

8

http://www.adver.tools/staging/urlytics/

@@ -1771,12 +1771,12 @@

Log File Analysis - Data Preparation

9

http://www.adver.tools/staging/urlytics/

@@ -1785,18 +1785,18 @@

Log File Analysis - Data Preparationuser_agent column.

ua_df = pd.json_normalize([user_agent_parser.Parse(ua) for ua in logs_df['user_agent']])
-ua_df.columns = 'ua_' + ua_df.columns.str.replace('user_agent\.', '', regex=True)
+ua_df.columns = 'ua_' + ua_df.columns.str.replace(r'user_agent.', '', regex=True)
 ua_df.head(10)
 
diff --git a/docs/_build/html/advertools.regex.html b/docs/_build/html/advertools.regex.html index 2cd1d660..ae7df940 100644 --- a/docs/_build/html/advertools.regex.html +++ b/docs/_build/html/advertools.regex.html @@ -16,8 +16,8 @@ - - + + @@ -42,7 +42,7 @@ advertools
- 0.16.3 + 0.16.4
@@ -72,7 +72,7 @@
  • Crawl Analytics
  • Crawl headers (HEAD method only)
  • Crawl images
  • -
  • Log File Analysis
  • +
  • Python Log File Analysis
  • Parse and Analyze Crawl Logs in a Dataframe
  • Reverse DNS Lookup
  • Analyze Search Engine Results (SERPs)
  • diff --git a/docs/_build/html/advertools.reverse_dns_lookup.html b/docs/_build/html/advertools.reverse_dns_lookup.html index 3450eb30..5b212a29 100644 --- a/docs/_build/html/advertools.reverse_dns_lookup.html +++ b/docs/_build/html/advertools.reverse_dns_lookup.html @@ -16,8 +16,8 @@ - - + + @@ -27,7 +27,7 @@ - + @@ -42,7 +42,7 @@ advertools
    - 0.16.3 + 0.16.4