DNS Record Comparison and Change Detection System Technical Documentation

SecHard

DNS Record Comparison and Change Detection System Technical Documentation

1. Purpose

This system is developed to detect changes in DNS records and report these changes. By comparing previous and subsequent DNS records, the system identifies newly added, updated, deleted, and unchanged records. During the comparison process, Natural Language Processing (NLP) techniques are used to calculate text similarity and tolerate spelling errors.


2. Technical Details

2.1. Libraries and Methods Used

  • Natural Library: Used to calculate text similarity. Specifically, the Jaro-Winkler distance algorithm is used to compute a similarity score tolerant of spelling errors.

  • Text Preprocessing: Before similarity calculation, DNS record texts are normalized:

    • Case-insensitive conversion.

    • Removal of punctuation marks.

    • Removal of extra spaces.

  • Weighted Similarity Calculation: Different fields of DNS records (zone, type, hostname, value) are assigned separate weights, and the total similarity score is calculated using these weights.

2.1.1. Jaro-Winkler Distance

Jaro-Winkler distance is an algorithm used to measure the similarity between two text strings. It first calculates the Jaro distance and then enhances this value.

2.1.1.1. Finding Matching Characters
  • Identifies matching characters (letters, numbers, etc.) between two texts.

  • Matching characters can be at most

    image-20250312-072721.png

    a certain position distance apart, which depends on the length of the texts.

2.1.1.2. Calculating Transpositions
  • If the order of matching characters differs, this is considered a transposition.

  • The number of transpositions (t) is calculated as half of the mismatched characters.

2.1.1.3. Jaro Distance Formula

The Jaro distance is calculated as follows:

 

image-20250312-072833.png

Where:

  • m: is the number of matching characters.

  • image-20250312-073013.png

    and are the lengths of the two strings.

  • t: is the number of transpositions.

 

2.1.2. Jaro-Winkler Enhancement

Jaro-Winkler enhances the Jaro distance with prefix similarity, focusing on how similar the beginning parts of two texts are.

2.1.2.1. Prefix Length (L)

  • The number of matching characters at the start of the two texts.

  • Typically limited to a maximum of 4 characters.

2.1.2.2. Scaling Factor (p)

  • Determines the contribution of prefix similarity to the Jaro distance.

  • Usually set to p = 0.1.

2.1.2.3. Jaro-Winkler Distance Formula

The Jaro-Winkler distance (JW) is calculated by the following formula.

Where:

image-20250312-073711.png
  • j: is the Jaro distance.

  • l: is the prefix length.

  • p: is the scaling factor (typically 0.1).

 

2.2. Core Functions

  • preprocessText(text):

    • Normalizes texts.

    • Example: "Example.COM.""example com"

  • calculateSimilarity(a, b):

    • Computes similarity between two DNS records.

    • Uses separate weights for each field:

      • Zone: 40%

      • Type: 30%

      • Hostname: 20%

      • Value: 10%

    • Uses Jaro-Winkler distance for similarity scoring.

  • enhancedEqualityCheck({ oldRecord, newRecord }):

    • Checks if two records are identical.

    • Special serial number comparison for SOA records.

    • Compares value, TTL, and priority for other record types.

  • getDnsRecords(resourceId):

    • Retrieves DNS records for a specific resource from the database.


3. Workflow

  1. Fetching Previous Records:

    • Retrieves current DNS records using assetsBeforeUpdate.

  2. Monitoring Process:

    • Runs specific recipes to update DNS records.

  3. Fetching Updated Records:

    • Retrieves updated DNS records using assetsAfterUpdate.

  4. Change Analysis:

    • Compares previous and updated records.

    • Records with a similarity score above 0.75 are considered matches.

    • Changes are categorized into four types:

      • created: Newly added records.

      • updated: Modified records.

      • deleted: Removed records.

      • unchanged: Records that remain the same.

  5. Change Notification:

    • Sends a notification via MessageService if changes are detected.

  6. Returning Results:

    • Returns results through QueueManager.


4. Contribution of NLP

5. Example Scenario

Old Record:

{ "zone": "example.com", "hostname": "www", "value": "192.168.1.1" }

New Record:

{ "zone": "exmple.com", "hostname": "ww", "value": "192.168.1.1" }

Similarity Calculation:

  • Zone: "example.com" vs "http://exmple.com " → 88%

  • Hostname: "www" vs "ww" → 83%

  • Total Score: (0.4*0.88) + (0.3*1) + (0.2*0.83) + (0.1*1) = 0.89

  • Result: Match accepted (0.89 > 0.75).

6. Performance and Optimization

  • Large Datasets: Performance can be improved by grouping records based on zones.

  • Caching: Frequently used records can be cached for efficiency.

  • Threshold Adjustment: SIMILARITY_THRESHOLD value can be dynamically adjusted based on the dataset.

7. Dependencies

  • Natural Library: Used for text similarity calculations.

8. Conclusion

This system accurately detects DNS record changes and provides a spelling error-tolerant solution using NLP techniques. It ensures a smarter matching mechanism without altering the output format of the existing system.

 

SecHard