SecHard
DNS Record Comparison and Change Detection System Technical Documentation
1. Purpose
This system is developed to detect changes in DNS records and report these changes. By comparing previous and subsequent DNS records, the system identifies newly added, updated, deleted, and unchanged records. During the comparison process, Natural Language Processing (NLP) techniques are used to calculate text similarity and tolerate spelling errors.
2. Technical Details
2.1. Libraries and Methods Used
Natural Library: Used to calculate text similarity. Specifically, the Jaro-Winkler distance algorithm is used to compute a similarity score tolerant of spelling errors.
Text Preprocessing: Before similarity calculation, DNS record texts are normalized:
Case-insensitive conversion.
Removal of punctuation marks.
Removal of extra spaces.
Weighted Similarity Calculation: Different fields of DNS records (zone, type, hostname, value) are assigned separate weights, and the total similarity score is calculated using these weights.
2.1.1. Jaro-Winkler Distance
Jaro-Winkler distance is an algorithm used to measure the similarity between two text strings. It first calculates the Jaro distance and then enhances this value.
2.1.1.1. Finding Matching Characters
Identifies matching characters (letters, numbers, etc.) between two texts.
Matching characters can be at most
a certain position distance apart, which depends on the length of the texts.
2.1.1.2. Calculating Transpositions
If the order of matching characters differs, this is considered a transposition.
The number of transpositions (t) is calculated as half of the mismatched characters.
2.1.1.3. Jaro Distance Formula
The Jaro distance is calculated as follows:
Where:
m: is the number of matching characters.
and are the lengths of the two strings.
t: is the number of transpositions.
2.1.2. Jaro-Winkler Enhancement
Jaro-Winkler enhances the Jaro distance with prefix similarity, focusing on how similar the beginning parts of two texts are.
2.1.2.1. Prefix Length (L)
The number of matching characters at the start of the two texts.
Typically limited to a maximum of 4 characters.
2.1.2.2. Scaling Factor (p)
Determines the contribution of prefix similarity to the Jaro distance.
Usually set to p = 0.1.
2.1.2.3. Jaro-Winkler Distance Formula
The Jaro-Winkler distance (JW) is calculated by the following formula.
Where:
j: is the Jaro distance.
l: is the prefix length.
p: is the scaling factor (typically 0.1).
2.2. Core Functions
preprocessText(text):
Normalizes texts.
Example: "Example.COM." → "example com"
calculateSimilarity(a, b):
Computes similarity between two DNS records.
Uses separate weights for each field:
Zone: 40%
Type: 30%
Hostname: 20%
Value: 10%
Uses Jaro-Winkler distance for similarity scoring.
enhancedEqualityCheck({ oldRecord, newRecord }):
Checks if two records are identical.
Special serial number comparison for SOA records.
Compares value, TTL, and priority for other record types.
getDnsRecords(resourceId):
Retrieves DNS records for a specific resource from the database.
3. Workflow
Fetching Previous Records:
Retrieves current DNS records using assetsBeforeUpdate.
Monitoring Process:
Runs specific recipes to update DNS records.
Fetching Updated Records:
Retrieves updated DNS records using assetsAfterUpdate.
Change Analysis:
Compares previous and updated records.
Records with a similarity score above 0.75 are considered matches.
Changes are categorized into four types:
created: Newly added records.
updated: Modified records.
deleted: Removed records.
unchanged: Records that remain the same.
Change Notification:
Sends a notification via MessageService if changes are detected.
Returning Results:
Returns results through QueueManager.
4. Contribution of NLP
Tolerance to Spelling Errors: High similarity score between "example.com" and "http://exmple.com ".
Case Insensitivity: "EXAMPLE.COM" and "example.com" are treated as exact matches.
Tolerance to Punctuation Marks: "example.com." and "example.com" are considered identical.
5. Example Scenario
Old Record:
{ "zone": "example.com", "hostname": "www", "value": "192.168.1.1" }New Record:
{ "zone": "exmple.com", "hostname": "ww", "value": "192.168.1.1" }Similarity Calculation:
Zone: "example.com" vs "http://exmple.com " → 88%
Hostname: "www" vs "ww" → 83%
Total Score: (0.4*0.88) + (0.3*1) + (0.2*0.83) + (0.1*1) = 0.89
Result: Match accepted (0.89 > 0.75).
6. Performance and Optimization
Large Datasets: Performance can be improved by grouping records based on zones.
Caching: Frequently used records can be cached for efficiency.
Threshold Adjustment: SIMILARITY_THRESHOLD value can be dynamically adjusted based on the dataset.
7. Dependencies
Natural Library: Used for text similarity calculations.
8. Conclusion
This system accurately detects DNS record changes and provides a spelling error-tolerant solution using NLP techniques. It ensures a smarter matching mechanism without altering the output format of the existing system.
SecHard