September 21, 2008 will

Announcing FS 0.1.0, a Python file-system

This is something I have been hacking together for a while now; FS is a file-system abstraction for Python. It has reached a stable state and is worthy of an official (0.1.0) release.

First, a brief history of the project. A while back, I was working mainly with desktop applications in wxPython. I found that I had a number of sources for files; there were read-only resources such as images, per-user files and per-installation config files. The location of these files would change depending on whether I was debugging or building a release version. The logic required to manage all this was pretty ugly and error-prone. So I wrote a collection of classes to bring all these disparate locations for files under a single virtual file-system. For example, to open a config file I could just read 'user/settings.cfg', and to open an image resource I could just read from 'resources/logo.png' and the virtual file-system would do the right thing and return a file-like object.

This turned out to be insanely useful, and I used it for a number of projects. I also used it while working for ICC,  and I had the opportunity to enhance it based on feedback from colleagues. In the last few months I have re-written it from scratch, because I wanted to avoid any copyright issues, but mainly because I could make a better job of it the second time around.

Getting Started

You can install FS with the command 'easy_install fs', or manually by downloading the latest release from (http://code.google.com/p/pyfilesystem/downloads/list) and running 'python setup.py install'. FS has been checked on Linux and Windows, but should run anywhere since ultimately it uses the standard library. There is no documentation other than the docstrings at the moment, so this post will have to suffice till I write up the API.

The main module is called fs, and there are a number of sub-modules:

  • fs.osfs Contains the class OSFS, which is a simple layer around the operating systems own file-system.
  • fs.memoryfs Contains the class MemoryFS, which is a file-system that exists only in memory.
  • fs.mountfs Contains the class MountFS which can be used to mount other file-systems at various places in the directory structure.
  • fs.multifs Contains the class MultiFS which creates a file-system from other file-systems which are tried in-order till a file operation succeeds.
  • fs.zipfs Contains the class ZipFS which creates a file-system from a zip file.
  • fs.tempfs Contains the class TempFS which create a temporary file-sytem that can be automatically cleaned up.
  • fs.utils Contains a number of functions for dealing with FS objects.
  • fs.browsewin Contains the 'browse' function which opens a tree view of the file-system passed to it. This is mainly a debugging aid. Requires wxPython.

Examples

In lieu of documentation, I'm just going to run through an interactive session that shows a few features.

>>> from fs import *
>>> home = osfs.OSFS('~/')
>>> from fs.browsewin import browse
>>> browse(home)

This creates an object called 'home' which represents your home directory (use a different path if you are on a non-linux system). You can use it to open files, list directory contents, copy files etc. But it can never access files that aren't under your home directory, so it could be considered a sand box view on to the underlaying file-system of the OS. The root of the home object is re-positioned, so the path "/readme.txt" would map to "~/home/readme.txt". Most of the methods of an FS object are pretty self-explanatory, but there are a few that require explanation. Try the following for example:

>>> projects = home.opendir('projects') # Assumes there is a directory called ~/projects
>>> projects
<SubFS: /projects in <OSFS: /home/will>>
>>> browse(projects)

There is no concept of a working directory for FS objects. Rather than something like 'chdir' there is an opendir method that returns a new FS object representing everything under the sub-directory root. So if you were to do projects.open("test.py"), it would return a file object for "~/projects/test.py".

Have you ever wondered how much space the .py files take up in your home directory? Give this a try:

>>> print sum(home.getsize(path) for path in home.walkfiles(wildcard="*.py"))

Here's how you could use ZipFS to archive your projects folder:

>>> projects_archive = zipfs.ZipFS('projects.zip', 'w')
>>> from fs.utils import copydir
>>> copydir(projects, projects_archive)
>>> projects_archive.close()

If you are interested in learning more, have a look at the docstrings in base.py, or just ask me. I promise to get proper documentation up soon.

FS is politeware,  you can use it for any purpose you like, as long as you say thanks!

Disclaimer: FS has been well tested, but as there is file access involved, be careful!
Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Ilpo

Have you thought any fuse support?

gravatar
Stavros

Hmm, I have created something similar, omnisync: https://launchpad.net/omnisync.

It's a file synchroniser but it uses virtual file systems for synchronising files, so you can just specify an sftp/s3/local server or whatever and it will sync the two directories. I thought of doing a zip module as well but since zips don't support some features such as arbitrary writes, I decided against it. How does FS handle that?

gravatar
Michael Foord

Does it work with Windows?

gravatar
Will

Ilpo, don't know much about Fuse, but the OSFS object will expose the linux filesystem. Exposing a FS object to Fuse is not something I have plans to do, but I'm sure its possible!

Stavros, thanks for the link, will check it out. The ZipFS object creates temporary files for writing. When the temp file is closed, its contents are added to the zip.

gravatar
Will

Michael, yes it does. :-)

gravatar
Jussi

How about returning information about the current path object when there is not parameter given? For example:

tmp = osfs.OSFS('C:\\tmp')
# This requires name of the directory and will fail.
tmp.isdirempty()
# This does what I wanted though.
tmp.isdirempty(".")

There are others too: getinfo, isdir, isfile,...
listdir works like that :)

Anyway... Nice work! This is just what I needed. Thank you.

gravatar
Michael Foord

Cool :-)

gravatar
Glyph Lefkowitz

Twisted has something similar called FilePath.

http://twistedmatrix.com/documents/8.1.0/api/twisted.python.filepath.FilePath.html

It's got some interesting advantages; for example, there's a Zip implementation which can stream data out of an archive rather than reading the entire contents of an individual file into memory.

http://twistedmatrix.com/documents/8.1.0/api/twisted.python.zippath.ZipArchive.html

Another advantage of this is that you can get access to a FilePath object from twisted.python.modues, allowing something sort of like setuptools' pkg_resources (you can load resources from "next to your code" whether it's in a zip file or not), but with much less overhead.

http://twistedmatrix.com/documents/8.1.0/api/twisted.python.modules.html

But your offering here looks much more coherent and polished. We could really use some help maintaining, documenting, and promoting it, if you would be willing to join forces :).

gravatar
Doug Napoleone

One place where this could be of a huge benefit is on the Google App Engine platform which does not have a 'real filesystem' and most of the os/os.path features are missing.

Unlike twisted it looks like it would not take much to get a subset working under GAE (the wx browser stuff would need a GAE html interface).

This would be a huge help to getting existing projects up and running under GAE.

gravatar
Kevin Dangoor

This does look quite nifty. Have you checked out Jason Orendorff's path.py module? Though the aims are different, there is some overlap in that path.py is aiming to make working with files easier (and I've found that it does indeed make working with files much easier).

gravatar
Jean-Paul Calderone

Doug, when you said "Unlike twisted it looks like it would not take much to get a subset working under GAE", what exactly did you mean? FS 0.1.0 isn't comparable to Twisted. It is comparable to one module in Twisted, twisted.python.filepath. Do you mean that twisted.python.filepath would be much harder to get working on GAE than FS 0.1.0 would be? If so, can you explain why?

(Will, sorry, I hope this doesn't devolve into some boring off-topic discussion, but I'm really curious about what Doug meant.)

gravatar
Will

Jussi, I'll give that some thought!

Glyph, I was disappointed that the zipfile module in the standard library couldn't stream zipped files. I'm all for joining forces, but I'd like to concentrate on my interface. I may borrow your zip implementation -- in the spirit of open source. :-)

Doug, Good idea. Although I recall reading about someone who worked around the limit. Putting templates in a zip file perhaps.

Kevin, I have used Jasons path.py module. Very useful it is too, but I didn't like putting all that functionality in a class derived from a string. It didn't sit well for me, from an OO point of view. It can still be used with FS though.

Jean, boring off-topic discussions are always welcome here!

gravatar
David Grant

A comments about Jussi's comments. It's a good idea although it adds a bit of "magic" which isn't always a good thing, although I am somewhat in favour of doing this. Jussi should note though that you can just pass in an empty string. You don't have to pass in "." in order to get information about the root.

I have to complement you on your code style. I think how you chose to make all those simple methods in osfs (like exists, isdir, isfile) into 2-liners, instead of one-liners.

That's cool that you got within-zipfile copying working. I finally figured out how you did it after looking at the code for a while.

I think remove() could work too. We could just lazily store all the remove operations and then when the zip file is closed, re-create the entire zip file from scratch with those files removed.

gravatar
jaross
what about itools.vfs? http://docs.hforge.org/itools/vfs.html
gravatar
Andy
Hi Will - just downloaded this and it looks great! Thank you very much for doing it!

Just a quick question…. I'm just wanting to confirm what the license for “fs” is. “Politeware” is mentioned above, but in the “pkg-info” file it mentions the Python license. So, if you could confirm which one it is, that'll be great!
Thanks again - bye for now -
- Andy
gravatar
Evan Driscoll
I was looking for something I could use as a mock file system for testing purposes, and came across this. Very helpful; thanks!

I have a couple suggestions, and a function that you may find useful.

First, the suggestions:
- Make MemoryFile into a context manager so you can say
“with fs.memoryfs.open(…) as file:” as you can with the open
builtin. (I can work around this easily enough though:
 def enterMemoryFile(self):
    pass
 
def exitMemoryFile(self, a,b,c):
    self.close()
 
mfs.MemoryFile.__enter__ = enterMemoryFile
mfs.MemoryFile.__exit__ = exitMemoryFile
(I imported fs.memoryfs as mfs)

- Support the notion of a current directory

- Make your path-handling more path-separator agnostic. One great
suggestion I saw was to make a mock filesystem where os.sep was
something very weird; but you seem to depend on it being either
/ or \:
 >>> import os
>>> os.sep='#'
>>> from fs.memoryfs import MemoryFS
>>> fs = MemoryFS()
>>> fs.makedir("a#b", recursive=True)
<MemoryFS>
>>> fs.makedir("a/c", recursive=True)
<MemoryFS>
>>> fs.makedir("a\\d", recursive=True)
<MemoryFS>
>>> for e in fs.walk('/a'):
...     print e
... 
'/a', [])
('/a/d', [])
('/a/c', [])

Now a handy function. Instead of a bunch of makedir and createfile calls to set up the file system, I wrote a function that will let you basically write a file system as a bunch of nested dictionaries. The keys to dictionaries are the file/directory names, and the values are the contents. String contents are files with that string, and dictionary contents are subdirectories.

 def create_mock_filesystem(fs_dict):
    fs = memoryfs.MemoryFS()
    _create_mock_filesystem_impl(fs_dict, fs.opendir('/'))
    return fs
def _create_mock_filesystem_impl(fs_dict, subfs):
    for direntry, contents in fs_dict.iteritems():
        assert type(direntry) is str
        
        if type(contents) is str:
            # direntry is a file
            subfs.createfile(direntry, contents)
        elif type(contents) is dict:
            # direntry is a directory
            subfs.makedir(direntry)
            _create_mock_filesystem_impl(contents, subfs.opendir(direntry))
            pass
        else:
            assert False

To wit:

 >>> fs_dict = {
...     'file.txt' : 'I am a file!',
...     'some_dir' : {
...             'file.txt' : 'I am another file!\nIn fact, I have two lines!',
...     }
... }
>>> fs = create_mock_filesystem(fs_dict)
>>> for name in ['file.txt', '/some_dir/file.txt']:
...     print "Contents of file", name
...     with fs.open(name) as file:
...             for line in file:
...                     print "   ", line.rstrip()
... 
Contents of file file.txt
    I am a file!
Contents of file /some_dir/file.txt
    I am another file!
    In fact, I have two lines!

So thanks again, you saved me from a bit of reimplementation. ;-)
gravatar
Evan Driscoll
Damn, there's a big mistake in my last post… I pasted a version of enterMemoryFile from before I tested it and fixed a bug. That function needs to return self.

Here's the actual code I have now:
 # Allow MemoryFile to be used as a context manager ('with fs.open(...) as file:')
try:
    memoryfs.MemoryFile.__enter__
except AttributeError:
    def _memoryfile_enter(self):
        return self
    memoryfs.MemoryFile.__enter__ = _memoryfile_enter
try:
    memoryfs.MemoryFile.__exit__
except AttributeError:
    def _memoryfile_exit(self, a, b, c):
        self.close()
    memoryfs.MemoryFile.__exit__ = _memoryfile_exit