Excalibur's Sheath

Python: Working with Files and Data

Mar 15, 2026 By: Jordan McGilvrayhomelab,python,automation,scripting,file-handling,csv,json,encryption,sysadmin,cli

Python in the Homelab: Part 3 of 3

Last week we walked through the essential building blocks of Python — variables, loops, and conditionals — and saw how these constructs become the backbone of real scripts you’ll run in your homelab. If you missed it or want to refresh, that material is over at Python Variables, Loops, and Conditionals. Those concepts give you the logic needed to make decisions on data, and now it’s time to talk about where that data actually lives: files, structured formats, and directories on disk.

In a homelab, raw data rarely stays in one place. Logs accumulate, configuration files change, and monitoring tools constantly generate output. Servers, containers, and network appliances are all producing data simultaneously. Python gives you the ability to read, process, and organize this information efficiently, transforming raw output into summaries, reports, or archived records. When automated properly, these workflows remove hours of manual effort.

Beyond simply moving bytes around, Python can automate repetitive tasks such as aggregating CSV exports, parsing JSON outputs from monitoring tools, or scanning log files for critical errors. The logic constructs you learned previously—loops and conditionals—become far more powerful once they start operating on real system data. Instead of manually reviewing logs or copying values into spreadsheets, Python can perform those operations in seconds.

Finally, sensitive information sometimes needs protection. Logs may contain IP addresses, usernames, system paths, or service configurations that shouldn’t be stored in plaintext archives. This article introduces lightweight encryption using Python’s cryptography library, giving you the option to secure backups or log files before archiving. Combined with data processing and directory management, you’ll have the foundation to automate a secure, organized, and maintainable homelab environment.

Homelab Tip:
The moment you start automating file handling is the moment your homelab begins to scale. Manual log inspection works for one server. Automation works for ten.


Working with Plain Text Files

At its core, file handling means opening a file, reading or writing data, and then closing it safely. Python provides a simple way to do this using the built-in open() function.

The safest and most common pattern uses a context manager:

with open("example.log", "r") as f:
    data = f.read()
# file is automatically closed here

The with statement ensures the file always closes properly, even if the script encounters an error. In long-running automation scripts this matters enormously. Leaving files open can cause memory leaks or file locking problems, especially when scripts process hundreds of files.

There are several ways to read file contents depending on your needs.

Common reading methods:

  • read() — loads the entire file into a single string
  • readline() — reads a single line at a time
  • readlines() — returns a list containing every line

When working with large logs, line-by-line iteration is usually the best approach because it keeps memory usage low.

Example: scanning a log file for errors.

with open("system.log", "r") as f:
    for line in f:
        if "ERROR" in line:
            print(line.strip())

This approach allows scripts to process very large logs efficiently, since only one line is stored in memory at a time.

Writing files works the same way:

with open("summary.txt", "w") as out:
    out.write("Processed summary of logs\n")

The mode parameter controls behavior:

Mode Purpose
"r" Read existing file
"w" Write (overwrite existing content)
"a" Append to file
"rb" Read binary data

These fundamentals allow scripts to extract information, generate reports, and integrate with automated pipelines.

Automation Principle:
If you find yourself opening the same log file every day to check something, Python should probably be doing it for you.


Structured Data Files: CSV and JSON

Plain text files are useful, but most real system data uses structured formats. Two of the most common formats used in automation workflows are CSV and JSON.

Each format serves a different purpose.

  • CSV handles tabular data
  • JSON handles hierarchical or nested data

Understanding both allows Python scripts to interact with monitoring tools, APIs, configuration files, and exported reports.


CSV: Tabular Data

CSV (Comma-Separated Values) is extremely common because it is simple and portable. Many monitoring systems export performance metrics in CSV format, and spreadsheets frequently use it as an interchange format.

Python’s built-in csv module makes reading these files straightforward.

import csv
with open("usage.csv", newline="") as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        cpu = float(row["cpu_load"])
        print(f"{row['timestamp']} – CPU: {cpu}")

DictReader is particularly useful because it maps columns to dictionary keys, making the data easier to work with.

Instead of referencing columns by number, you reference them by name, which improves readability and reduces mistakes.

Writing CSV files works similarly:

with open("summary.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["timestamp", "avg_cpu"])
    writer.writerow(["2026-03-15 10:00", 12.5])

CSV works best when your data resembles a table of measurements or metrics.

Typical homelab examples include:

  • CPU usage exports
  • bandwidth monitoring data
  • container resource statistics
  • uptime reports

JSON: Nested Data

JSON is the dominant format for configuration files and API responses.

Unlike CSV, JSON supports nested structures, meaning values can contain dictionaries and lists.

Example:

import json
with open("config.json") as f:
    data = json.load(f)
data["services"]["ssh"]["enabled"] = False
with open("config.json", "w") as f:
    json.dump(data, f, indent=2)

The indent=2 option formats the output so it remains human readable.

This matters more than people expect. In real environments, configuration files are often reviewed manually, audited, or tracked in version control systems.

Readable JSON makes troubleshooting dramatically easier.

Practical Rule:
Machines prefer compact data. Humans prefer readable data. Good automation respects both.


File and Directory Management

Handling file contents is only part of the picture. Automation scripts frequently need to organize and move files across directories.

Python’s pathlib module provides a modern and readable interface for filesystem operations.

Example: scanning a directory for log files.

from pathlib import Path
logs_dir = Path("/var/logs")
for file in logs_dir.glob("*.log"):
    print(file.name)

Creating directories is equally simple:

archive = Path("archive")
if not archive.exists():
    archive.mkdir()

Moving or renaming files:

old = Path("system.log")
new = archive / "system.log.bak"
old.replace(new)

With just a few lines of code you can automate tasks such as:

  • log rotation
  • archiving old files
  • organizing backup directories
  • cleaning temporary files

These small tasks add up quickly. A script that runs every night can maintain a clean filesystem automatically, preventing storage clutter from accumulating over time.

Automation Quote:
A tidy filesystem rarely happens by accident. It happens because a script quietly cleans things up while everyone sleeps.


Simple Data Processing: Filtering and Summaries

Once files are loaded into memory, the next step is usually data processing.

For many homelab use cases this means filtering values and generating summaries.

Example: calculating average CPU usage across multiple hosts.

totals = {}
with open("usage.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        host = row["hostname"]
        cpu = float(row["cpu_load"])
        totals.setdefault(host, []).append(cpu)
for host, cpus in totals.items():
    avg = sum(cpus) / len(cpus)
    print(f"{host} avg CPU: {avg:.1f}")

Python’s list comprehensions can simplify filtering operations.

high_loads = [cpu for cpu in cpus if cpu > 80.0]

This produces a list containing only high CPU values.

By processing files line-by-line and avoiding unnecessary memory usage, Python scripts can comfortably handle very large datasets generated by monitoring systems.

Data Insight:
Raw logs are noise. Processed data becomes information.


Optional: Encrypting and Decrypting Files

Some files contain information that should not be stored in plaintext.

Examples include:

  • authentication logs
  • backup archives
  • system configuration exports

Python’s cryptography library includes a tool called Fernet, which provides symmetric encryption that is both secure and easy to implement.

Example encryption workflow:

from cryptography.fernet import Fernet
key = Fernet.generate_key()
fernet = Fernet(key)
# Encrypt
with open("secret.txt", "rb") as f:
    data = f.read()
encrypted = fernet.encrypt(data)
with open("secret.txt.enc", "wb") as out:
    out.write(encrypted)
# Decrypt
with open("secret.txt.enc", "rb") as f:
    encrypted = f.read()
decrypted = fernet.decrypt(encrypted)
with open("secret.decrypted.txt", "wb") as out:
    out.write(decrypted)

The encryption key must be protected carefully.

If the key is lost, the encrypted data becomes permanently inaccessible.

In real automation environments the key would usually be stored in:

  • a protected file
  • an environment variable
  • a secrets manager

Security Tip:
Encryption protects your data.
Key management protects your encryption.


Putting It All Together: Mini Automation Script

The following example combines multiple techniques from this article.

It:

  • parses log files
  • calculates averages
  • encrypts archived logs
  • generates a summary report
import csv
from pathlib import Path
from cryptography.fernet import Fernet
LOG_DIR = Path("/home/homelab/logs")
ARCHIVE_DIR = Path("/home/homelab/archive")
ARCHIVE_DIR.mkdir(exist_ok=True)
key = Path("encrypt.key").read_bytes()
fernet = Fernet(key)
summary = []
for log_file in LOG_DIR.glob("*.log"):
    with open(log_file) as f:
        reader = csv.DictReader(f)
        total_cpu = 0
        count = 0
        for row in reader:
            total_cpu += float(row["cpu_load"])
            count += 1
    avg_cpu = total_cpu / count if count else 0
    summary.append((log_file.stem, avg_cpu))
    data = log_file.read_bytes()
    enc = fernet.encrypt(data)
    (ARCHIVE_DIR / f"{log_file.name}.enc").write_bytes(enc)
    log_file.unlink()
with open("daily_summary.csv", "w", newline="") as summary_file:
    writer = csv.writer(summary_file)
    writer.writerow(["log", "avg_cpu"])
    writer.writerows(summary)

This script demonstrates how file handling, processing, encryption, and directory management can work together in a single automation pipeline.


Summary

This week we explored the many ways Python interacts with files, from plain text logs to structured CSV and JSON. Context managers and built-in functions make reading and writing safe and predictable, while dictionaries and loops transform raw data into actionable insights. These skills form the backbone of reliable homelab automation workflows.

Structured data handling allows you to consolidate reports, analyze system metrics, and manage configuration files across multiple devices. Combined with Python’s directory management and file organization features, you can automate the movement and storage of files with confidence.

Optional encryption adds another layer of protection, ensuring sensitive logs and backups remain secure. Python’s cryptography.fernet provides a reliable and straightforward approach suitable for homelabs of any size.

Finally, the automation example demonstrates how modular Python code can integrate reading, processing, encryption, and archiving into a single workflow. With these techniques, you can automate routine tasks, maintain a clean filesystem, and keep your homelab data organized and secure.

More from the "Python in the Homelab" Series: