Python Get Size of File: Methods, Optimization, and Real-World Applications -

Determining file size is a crucial aspect of data management, system monitoring, and optimizing storage solutions. Whether you’re working with large datasets, logging files, or cloud storage, knowing how to efficiently get the size of a file in Python is essential for performance tuning, automation, and real-time monitoring.

This guide will cover multiple methods to get file size in Python, ranging from basic approaches using os and pathlib to advanced solutions for large-scale applications.

Why File Size Matters in Python?

Understanding file size is important for:

Optimizing storage and bandwidth usage in cloud applications.
Monitoring log files to prevent uncontrolled growth.
Handling large datasets efficiently in data science and machine learning.
Managing system resources in embedded and high-performance computing.

Basic Methods to Get File Size in Python

Python provides several built-in methods to retrieve file sizes:

1. Using os.path.getsize() (Most Common Approach)

import os

file_path = "example.txt"
size = os.path.getsize(file_path)

print(f"File size: {size} bytes")

Why Use `os.path.getsize()`?

✅ Simple and efficient for local files
✅ Works across Windows, Linux, and macOS

2. Using pathlib.Path.stat().st_size (Modern Approach)

The pathlib module, introduced in Python 3.4, provides an object-oriented approach to file handling:

from pathlib import Path

file_path = Path("example.txt")
size = file_path.stat().st_size

print(f"File size: {size} bytes")

Advantages of pathlib:

✅ More readable and Pythonic than os.path.getsize()
✅ Supports both files and directories

Advanced Techniques for Large-Scale Applications

For large files or cloud storage, basic methods may not be optimal. Here are some advanced techniques:

3. Getting File Size Without Loading the Entire File (Memory Efficient)

Reading an entire file into memory is inefficient for large datasets. Instead, we can read in chunks:

def get_large_file_size(file_path):
    with open(file_path, "rb") as f:
        f.seek(0, 2)  # Move pointer to the end of the file
        return f.tell()

size = get_large_file_size("large_data.csv")
print(f"Large file size: {size} bytes")

✅ Efficient for multi-gigabyte files
✅ Avoids unnecessary memory usage

4. Getting File Size in Human-Readable Format

For better user experience, we can convert bytes to KB, MB, GB:

def convert_size(size_in_bytes):
    for unit in ["B", "KB", "MB", "GB", "TB"]:
        if size_in_bytes < 1024:
            return f"{size_in_bytes:.2f} {unit}"
        size_in_bytes /= 1024

file_size = os.path.getsize("example.txt")
print(f"File size: {convert_size(file_size)}")

✅ Useful for UI displays, reports, and logging

5. Getting File Size in Cloud Storage (AWS S3, Google Drive, etc.)

For cloud applications, file size is accessed via APIs.

AWS S3 Example

import boto3

s3 = boto3.client("s3")

def get_s3_file_size(bucket, file_key):
    response = s3.head_object(Bucket=bucket, Key=file_key)
    return response["ContentLength"]

size = get_s3_file_size("my-bucket", "backup/data.csv")
print(f"S3 File Size: {convert_size(size)}")

✅ Ideal for serverless applications and cloud storage

Handling Special Cases: Directories and Compressed Files

6. Getting Total Directory Size in Python

To calculate the total size of a directory, sum the size of all files:

def get_directory_size(directory):
    total_size = 0
    for file in Path(directory).rglob("*"):  # Recursively find all files
        if file.is_file():
            total_size += file.stat().st_size
    return total_size

dir_size = get_directory_size("my_folder")
print(f"Directory size: {convert_size(dir_size)}")

✅ Useful for monitoring disk space usage

7. Getting File Size in a ZIP Archive

Compressed files require special handling:

import zipfile

def get_zip_file_size(zip_path, file_name):
    with zipfile.ZipFile(zip_path, "r") as zipf:
        return zipf.getinfo(file_name).file_size

zip_size = get_zip_file_size("archive.zip", "document.txt")
print(f"Compressed File Size: {convert_size(zip_size)}")

✅ Essential for handling backups and archived data

Best Practices for Getting File Size in Python

✅ Use os.path.getsize() or pathlib.Path.stat().st_size for local files
✅ For large files, use seek(0, 2) to avoid memory overhead
✅ Convert bytes to human-readable formats for better reporting
✅ For cloud storage, use appropriate API methods
✅ For ZIP and compressed files, handle them using zipfile or external tools

Final Thoughts

Knowing how to get the size of a file in Python is essential for developers, data scientists, and system administrators. By using efficient methods and best practices, you can optimize storage, improve performance, and manage resources effectively.

For further details, check out the official Python pathlib documentation:
Pathlib – Object-Oriented Filesystem Paths

Would you like to see real-world examples for handling file size in cloud platforms like Azure and Google Cloud? Let us know in the comments!