diff --git a/.idea/.gitignore b/.idea/.gitignore new file mode 100644 index 0000000..0a8642f --- /dev/null +++ b/.idea/.gitignore @@ -0,0 +1,10 @@ +# Default ignored files +/shelf/ +/workspace.xml +# Editor-based HTTP Client requests +/httpRequests/ +# Datasource local storage ignored files +/dataSources/ +/dataSources.local.xml +# Zeppelin ignored files +/ZeppelinRemoteNotebooks/ diff --git a/README.md b/README.md new file mode 100644 index 0000000..1d1e6a0 --- /dev/null +++ b/README.md @@ -0,0 +1,59 @@ +# Redact your Logs in Python + +## Introduction + +It often becomes necessary to hide the values of certain attributes in a String. The term for this process is called REDACTION. + +There are several ways to do this in a JSON or XML String. We can find several tutorials to do this, but I came across a different case. + +Although part of the String looks like a JSON structure, this String can come from an Error Log, which messes up the entire JSON structure and the codes I found don't work well in these situations. + +I found a really cool tutorial: + +[Secure Logging in Python: Redacting Sensitive Data with Filters](https://levelup.gitconnected.com/secure-logging-in-python-redacting-sensitive-data-with-filters-d49bd401c53) + +This material provides us with code that deals with REGEX patterns and allows, regardless of whether we have a poorly formatted JSON (or any other) structure, it works anyway. + +However, the material is well aimed at finding REGEX patterns, such as dates, credit card numbers, telephone numbers and other String formatting patterns. + +But what about when we need to hide specific attributes? + +I didn't find anything about it, so I decided to build a code that would also add this form of REDACTION to the tutorial above. + +## Understand the Code + +The code has two aspects: + +- Act based on REGEX patterns + +![img.png](images/img.png) + +- Act based on attribute names + +![img_1.png](images/img_1.png) + +We can then, from a list of Strings (Logs for example), have a poorly formatted String: + +![img_2.png](images/img_2.png) + +What the algorithm performs: + +- The algorithm looks for REGEX patterns, therefore, no matter the name of the attribute or if there is an attribute associated with the name (in a JSON structure), the algorithm will always replace the value with ****. +- The algorithm searches for the attribute name in the list (ATTRIBUTE_PATTERNS). This needs to be in the format: + + + '': '' + is the attribute name + : need to respect a JSON association + is the value between '', yes the value needs to be a String + +So you can use the class **Redaction.py** in you code, like this: + +![img_3.png](images/img_3.png) + +## Test + +Execute the Python code **redact.py** + +![img_4.png](images/img_4.png) + diff --git a/images/img.png b/images/img.png new file mode 100644 index 0000000..874e297 Binary files /dev/null and b/images/img.png differ diff --git a/images/img_1.png b/images/img_1.png new file mode 100644 index 0000000..a5e654a Binary files /dev/null and b/images/img_1.png differ diff --git a/images/img_2.png b/images/img_2.png new file mode 100644 index 0000000..47efd2d Binary files /dev/null and b/images/img_2.png differ diff --git a/images/img_3.png b/images/img_3.png new file mode 100644 index 0000000..60047fe Binary files /dev/null and b/images/img_3.png differ diff --git a/images/img_4.png b/images/img_4.png new file mode 100644 index 0000000..6824d5d Binary files /dev/null and b/images/img_4.png differ