Determining file size is a crucial aspect of data management, system monitoring, and optimizing storage solutions. Whether you’re working with large datasets, logging files, or cloud storage, knowing how to efficiently get the size of a file in Python is essential for performance tuning, automation, and real-time monitoring.
This guide will cover multiple methods to get file size in Python, ranging from basic approaches using os
and pathlib
to advanced solutions for large-scale applications.
Why File Size Matters in Python?
Understanding file size is important for:
- Optimizing storage and bandwidth usage in cloud applications.
- Monitoring log files to prevent uncontrolled growth.
- Handling large datasets efficiently in data science and machine learning.
- Managing system resources in embedded and high-performance computing.
Basic Methods to Get File Size in Python
Python provides several built-in methods to retrieve file sizes:
1. Using os.path.getsize() (Most Common Approach)
import os
file_path = "example.txt"
size = os.path.getsize(file_path)
print(f"File size: {size} bytes")
Why Use os.path.getsize()
?
✅ Simple and efficient for local files
✅ Works across Windows, Linux, and macOS
2. Using pathlib.Path.stat().st_size (Modern Approach)
The pathlib
module, introduced in Python 3.4, provides an object-oriented approach to file handling:
from pathlib import Path
file_path = Path("example.txt")
size = file_path.stat().st_size
print(f"File size: {size} bytes")
Advantages of pathlib:
✅ More readable and Pythonic than os.path.getsize()
✅ Supports both files and directories
Advanced Techniques for Large-Scale Applications
For large files or cloud storage, basic methods may not be optimal. Here are some advanced techniques:
3. Getting File Size Without Loading the Entire File (Memory Efficient)
Reading an entire file into memory is inefficient for large datasets. Instead, we can read in chunks:
def get_large_file_size(file_path):
with open(file_path, "rb") as f:
f.seek(0, 2) # Move pointer to the end of the file
return f.tell()
size = get_large_file_size("large_data.csv")
print(f"Large file size: {size} bytes")
✅ Efficient for multi-gigabyte files
✅ Avoids unnecessary memory usage
4. Getting File Size in Human-Readable Format
For better user experience, we can convert bytes to KB, MB, GB:
def convert_size(size_in_bytes):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_in_bytes < 1024:
return f"{size_in_bytes:.2f} {unit}"
size_in_bytes /= 1024
file_size = os.path.getsize("example.txt")
print(f"File size: {convert_size(file_size)}")
✅ Useful for UI displays, reports, and logging
5. Getting File Size in Cloud Storage (AWS S3, Google Drive, etc.)
For cloud applications, file size is accessed via APIs.
AWS S3 Example
import boto3
s3 = boto3.client("s3")
def get_s3_file_size(bucket, file_key):
response = s3.head_object(Bucket=bucket, Key=file_key)
return response["ContentLength"]
size = get_s3_file_size("my-bucket", "backup/data.csv")
print(f"S3 File Size: {convert_size(size)}")
✅ Ideal for serverless applications and cloud storage
Handling Special Cases: Directories and Compressed Files
6. Getting Total Directory Size in Python
To calculate the total size of a directory, sum the size of all files:
def get_directory_size(directory):
total_size = 0
for file in Path(directory).rglob("*"): # Recursively find all files
if file.is_file():
total_size += file.stat().st_size
return total_size
dir_size = get_directory_size("my_folder")
print(f"Directory size: {convert_size(dir_size)}")
✅ Useful for monitoring disk space usage
7. Getting File Size in a ZIP Archive
Compressed files require special handling:
import zipfile
def get_zip_file_size(zip_path, file_name):
with zipfile.ZipFile(zip_path, "r") as zipf:
return zipf.getinfo(file_name).file_size
zip_size = get_zip_file_size("archive.zip", "document.txt")
print(f"Compressed File Size: {convert_size(zip_size)}")
✅ Essential for handling backups and archived data
Best Practices for Getting File Size in Python
- ✅ Use
os.path.getsize()
orpathlib.Path.stat().st_size
for local files - ✅ For large files, use
seek(0, 2)
to avoid memory overhead - ✅ Convert bytes to human-readable formats for better reporting
- ✅ For cloud storage, use appropriate API methods
- ✅ For ZIP and compressed files, handle them using
zipfile
or external tools
Final Thoughts
Knowing how to get the size of a file in Python is essential for developers, data scientists, and system administrators. By using efficient methods and best practices, you can optimize storage, improve performance, and manage resources effectively.
For further details, check out the official Python pathlib documentation:
Pathlib – Object-Oriented Filesystem Paths
Would you like to see real-world examples for handling file size in cloud platforms like Azure and Google Cloud? Let us know in the comments!