mirror of
https://github.com/hoshikawa2/python_redaction.git
synced 2026-03-06 18:21:00 +00:00
documentation
This commit is contained in:
10
.idea/.gitignore
generated
vendored
Normal file
10
.idea/.gitignore
generated
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
# Default ignored files
|
||||
/shelf/
|
||||
/workspace.xml
|
||||
# Editor-based HTTP Client requests
|
||||
/httpRequests/
|
||||
# Datasource local storage ignored files
|
||||
/dataSources/
|
||||
/dataSources.local.xml
|
||||
# Zeppelin ignored files
|
||||
/ZeppelinRemoteNotebooks/
|
||||
59
README.md
Normal file
59
README.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Redact your Logs in Python
|
||||
|
||||
## Introduction
|
||||
|
||||
It often becomes necessary to hide the values of certain attributes in a String. The term for this process is called REDACTION.
|
||||
|
||||
There are several ways to do this in a JSON or XML String. We can find several tutorials to do this, but I came across a different case.
|
||||
|
||||
Although part of the String looks like a JSON structure, this String can come from an Error Log, which messes up the entire JSON structure and the codes I found don't work well in these situations.
|
||||
|
||||
I found a really cool tutorial:
|
||||
|
||||
[Secure Logging in Python: Redacting Sensitive Data with Filters](https://levelup.gitconnected.com/secure-logging-in-python-redacting-sensitive-data-with-filters-d49bd401c53)
|
||||
|
||||
This material provides us with code that deals with REGEX patterns and allows, regardless of whether we have a poorly formatted JSON (or any other) structure, it works anyway.
|
||||
|
||||
However, the material is well aimed at finding REGEX patterns, such as dates, credit card numbers, telephone numbers and other String formatting patterns.
|
||||
|
||||
But what about when we need to hide specific attributes?
|
||||
|
||||
I didn't find anything about it, so I decided to build a code that would also add this form of REDACTION to the tutorial above.
|
||||
|
||||
## Understand the Code
|
||||
|
||||
The code has two aspects:
|
||||
|
||||
- Act based on REGEX patterns
|
||||
|
||||

|
||||
|
||||
- Act based on attribute names
|
||||
|
||||

|
||||
|
||||
We can then, from a list of Strings (Logs for example), have a poorly formatted String:
|
||||
|
||||

|
||||
|
||||
What the algorithm performs:
|
||||
|
||||
- The algorithm looks for REGEX patterns, therefore, no matter the name of the attribute or if there is an attribute associated with the name (in a JSON structure), the algorithm will always replace the value with **<REDACTED>**.
|
||||
- The algorithm searches for the attribute name in the list (ATTRIBUTE_PATTERNS). This needs to be in the format:
|
||||
|
||||
|
||||
'<name>': '<value>'
|
||||
<name> is the attribute name
|
||||
: need to respect a JSON association
|
||||
<value> is the value between '', yes the value needs to be a String
|
||||
|
||||
So you can use the class **Redaction.py** in you code, like this:
|
||||
|
||||

|
||||
|
||||
## Test
|
||||
|
||||
Execute the Python code **redact.py**
|
||||
|
||||

|
||||
|
||||
BIN
images/img.png
Normal file
BIN
images/img.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 63 KiB |
BIN
images/img_1.png
Normal file
BIN
images/img_1.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 34 KiB |
BIN
images/img_2.png
Normal file
BIN
images/img_2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 121 KiB |
BIN
images/img_3.png
Normal file
BIN
images/img_3.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 258 KiB |
BIN
images/img_4.png
Normal file
BIN
images/img_4.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 246 KiB |
Reference in New Issue
Block a user