I'd like to announce version 2.0.0 of PyFilesystem, which is now available on PyPi.

PyFilesystem is a Python module I started some time in 2008, and since then it has been very much a part of my personal standard library. I've used it in personal and professional projects, as have many other developers and organisations.

Recap

If you aren't familiar with PyFilesystem; it's an abstraction layer for filesystems. Essentially anything with files and directories (hard-drive, zip file, ftp server, network filesystems etc.) may be wrapped with a common interface. With it, you can write code that is agnostic as to where the files are physically located.

Here's a quick example that recursively counts the lines of code in a directory:

def count_python_loc(fs):
    """Count non-blank lines of Python code."""
    count = 0
    for path in fs.walk.files(filter=['*.py']):
        with fs.open(path) as python_file:
            count += sum(1 for line in python_file if line.strip())
    return count

from fs import open_fs
projects_fs = open_fs('~/projects')
print(count_python_loc(projects_fs))

The fs argument to count_python_loc is an FS object, which encapsulates everything you would need to do with a filesystem. Because of this abstraction, the same code will work with any filesystem. For instance, counting the lines of code in a zip file is a single line change:

projects_fs = open_fs('zip://projects.zip')

See my previous posts on PyFilesystem for more back-story.

The tree method renders the filesystem structure with unicode box drawing characters. This can be a nice way of reporting file changes in a command line app, and a useful debugging aid in general.

The fact that there are trees on my wallpaper is entirely coincidental.

Why the update?

A lot has happened since 2008. Python 3 happened. The IO library happened. Scandir happened. And while PyFilesystem has kept up with those developments, the code has suffered from the weight of small incremental changes. Not to the degree that it required a re-write perhaps, but a new version gave me the opportunity to make improvements to the API which couldn't be done without breaking a lot of code.

The re-write was guided by 8 years of observations regarding what developers wanted from the library; what worked well, what felt awkward, and the numerous edge cases in creating the illusion that all filesystems work alike (they really don't). A lot of agonizing has gone in to the design of the new API to make it simple to use without sacrificing functionality. This often required breaking up large methods in to more atomic methods that do one thing and do it well.

Another motivation for this version was to make it easier to implement new filesystems. It turns out that all you need to implement any filesystem is to write 7 methods. Which I find somewhat remarkable. The new API has been designed to make it much easier to develop your own filesystems in general, with more of the heavy lifting been done by the base class. I'm hoping this will encourage more developers to release new filesystems, and for pre-2.0.0 filesystems to be ported (which I'll gladly assist with).

So what is new?

Cleaner code

The new project is a unified Python 2 and 3 code-base. It's legacy free code with 100% coverage.

I wanted it to be bullet-proof because PyFilesystem an integral part of Moya. Moya's static server uses PyFilesystem to serve assets. Here's a screenshot of that in action:

Moya uses PyFilesystem in its static server. This screenshot shows what happens when you statically serve a FTPFS.

Moving and copying got simpler.

Here's how you compress your projects directory as a zip file:

from fs.copy import copy_fs
copy_fs("~/projects", "zip://~/projects.zip")

This works because the fs.copy and fs.move modules accept both FS objects and FS URLs.

Simple file information

File information has been categorised under a few namespaces, so you can request only the information you are interested in (potentially avoiding needless system calls). Here's an example:

>>> from fs import open_fs
>>> my_fs = open_fs('.')
>>> info = my_fs.getinfo('setup.py', namespaces=['details', 'access'])
>>> info.name
'setup.py'
>>> info.is_dir
False
>>> info.user
'will'
>>> info.permissions
Permissions(user='rw', group='r', other='r')
>>> info.modified
datetime.datetime(2016, 11, 27, 0, 17, 29, tzinfo=<UTC>)

Directory scanning is more powerful

The original PyFilesystem had a design flaw that we were unfortunately stuck with; when you list a directory you couldn't know which paths were files and which were directories in a single call. A workaround was to make a call to retrieved just the directories, and a call which retrieved the files. Making two calls to retrieve the directory listing was inefficient for network filesystems. Another workaround used stat information, but that only worked for the OS filesystem.

In fs 2.0, the directory listing methods return Info objects which have an is_dir flag, so no need for any workarounds. There is also a page attribute which allows you to paginate large directories (handy if you have millions of files).

Directory walking has a simpler interface.

To compliment the directory scanning, there is a new directory walking class which supports filtering by wildcard pattern. Having an external object to do the directory walking allows for more flexibility. For instance, the copy_fs function accepts an optional walker parameter which can be used to specify which files should be copied.

Here's how you would use a Walker object to compress only .py files, while ignoring Git directories:

from fs.copy import copy_fs
from fs.walk import Walker
py_walker = Walker(filter=['*.py'], exclude_dirs=['*.git'])
copy_fs("~/projects", "zip://~/projects.zip", py_walker)

Closer to the metal

Not all filesystems are created equal; a zip file is very different from your HD, and completely different from a filesystem in the cloud. Designing a single API to makes them interchangeable is challenging. The original PyFilesystem API took the approach of supporting a common denominator of features, and simply didn't support things that weren't more or less universal (such as permissions). The new version has a different philosophy and will attempt to expose as much as possible.

Serializable API

The new API is designed to be very easy to serialize, making it easier to implement network filesystems with JSONRPC, XML, REST, etc.

In the early days I considered such things to be a novelty, but it's proven to be something that developers often want. The new API should make that less of a challenge.

Should you upgrade?

There is a lot of functionality packed in to the original PyFilesystem, such as exposing filesystems over HTTP and other network protocols, OS integration (FUSE and DOKEN), command line apps, etc. All very cool stuff, but it did make for a lengthy list of requirements. The new version is leaner and requires only a handful of pure-Python dependancies. The other cool stuff will be distributed as additional namespace packages.

If you require any of those features, you may want to stick with the pre 2.0 version until I've ported everything. Ditto if you need S3 support or one of the third-party implementations.

Otherwise, I think you will find fs 2.0.0 a solid additional to your standard library.

How to install

You can install fs with pip as follows:

pip install fs

Add the -U switch if you want to upgrade from a previous version.

Alternatively you can checkout PyFilesystem2 on GitHub.

Feedback?

Not is the best time to suggest changes and new features. Comments and PRs most welcome!

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Arthur

Hi, PyFilesystem looks fantastic! Why it is not available through Anaconda (conda install)?

gravatar
Will McGugan

Hi, PyFilesystem looks fantastic!

Thanks!

Why it is not available through Anaconda (conda install)?

I couldn't say. Never used Anaconda...

gravatar
Dave Martens

Arthur, I use Anaconda and PyFilesystem installed fine via pip into my Anaconda-specific site-packages directory (although I had a dependency on Visual C++ 9.0 because I installed from Win10).

gravatar
Arthur

Thank you Dave for the info, How did you do that? Besides, me too, I needed the VC ++ 9, I am on Win 10, too

gravatar
Dave Martens

Arthur, I was able to download VC++ 9 for Win10 at https://www.visualstudio.com/post-download-vs/?sku=vspython# (well this is the "success" page but you can click the "retry" link to restart the download). After installing VC++, pip was all I needed. If you have other issues, let me know and I'll help if I can.

gravatar
iPixelOldC

Hmmm,This is my way.[sorry,poor English] first,download scandir.[download here:http://www.lfd.uci.edu/~gohlke/pythonlibs] second,pip install scandir-1.4-cp35-cp35m-win_amd64.whl finally,pip install fs. why i cant use visual cpp build tools or another cpp build tools....because i hate it and i tired= = always fail...never success!!! i hope http://www.lfd.uci.edu/~gohlke/pythonlibs can help you:D

gravatar
Will McGugan

Thanks! I'll direct Windows users there. Installing in Windows is a pain in general.

gravatar
Dave Martens

Will,

I discovered PyFilesystem2 yesterday and will be putting it through the paces today. It looks really nice so thank you very much for your continued contributions to the community.

I also see that Dropbox support is mentioned in the issues list and that several options existed for PyFilesystem(1). DB support would be fantastic.

Thanks and Best Regards, Dave

gravatar
Will McGugan

Great. Let me know if you have any questions!

gravatar
Dave Martens

Thanks - I just emailed you a couple documentation-related suggestions that occurred to me when reading the official documentation and installing fs on Win10.

gravatar
PhilM

Hi, thank you for your work.

I was wondering if there's any interest in a mountable (read-only) HTTP filesystem, one for [filename, url]-pairs and one for HTTPApache dir listings. some questions:

1)would these FS's replace the current HTTPFS or would it go to contribute? Or does that depend on which license I 'd release it with?

2)what modules am I allowed to use? Is Beautiful Soup acceptable for HTTPApacheFS or am I limited to regex? Can I use extra threads for the http-requests?

3)how much am I allowed to use ad-hoc methods to optimize for real-world http? Eg now my fs makes a [fixed blocksize] http request for reading, while a dynamic one would be better for mounted files (os or vlc reading metadata and chapter information throughout a video-file, requiring lots of seeking and favoring a small blocksize , but when sequentially reading a file, doing too much http requests would decrease speed by a lot.)

4)any reason why archivefs and sqlitefs are not mentioned in /PyFilesystem/pyfilesystem/blob/master/docs/contrib.rst ? I didn't know about their existence!

gravatar
Will McGugan

Hi Phil,

I was wondering if there's any interest in a mountable (read-only) HTTP filesystem, one for [filename, url]-pairs and one for HTTPApache dir listings. some questions:

Definitely for the HTTPFS. Not sure what you mean by the "[filename, url]" pairs. Is that to mount individual URLs on a file path? There was functionality for that in the original version, but I abandoned it, because I didn't believe there was a use case for it. I might reconsider if you have a use case for it. For the apache dir lists, I think that might not be generic enough to go in to the core library. Maybe best left as an external project.

1)would these FS's replace the current HTTPFS or would it go to contribute? Or does that depend on which license I 'd release it with?

I've abandoned the contrib module. Basically I want to be able to maintain everything in the core library.

The HTTPFS is worthwhile adding to fs2. It shouldn't be hard to port. It would be great if you wanted to contribute that.

2)what modules am I allowed to use? Is Beautiful Soup acceptable for HTTPApacheFS or am I limited to regex? Can I use extra threads for the http-requests?

No to beautiful soup I'm afraid. I think its a bit too heavy weight in terms of dependancies. Threads are fine, as is pretty much everything in the stdlib.

3)how much am I allowed to use ad-hoc methods to optimize for real-world http? Eg now my fs makes a [fixed blocksize] http request for reading, while a dynamic one would be better for mounted files (os or vlc reading metadata and chapter information throughout a video-file, requiring lots of seeking and favoring a small blocksize , but when sequentially reading a file, doing too much http requests would decrease speed by a lot.)

Pretty much anything goes as long as it works according to documentation, and it passes the tests. You could use the getmeta and setmeta methods to tweak behaviour without adding additional methods.

4)any reason why archivefs and sqlitefs are not mentioned in /PyFilesystem/pyfilesystem/blob/master/docs/contrib.rst ? I didn't know about their existence!

Those are external liba, and contrib.rst lists the filesystems in the fs.contrib module. I would like to make it easier to discover implementations. I'm looking in to hosting a wiki on pyfilesystem.org.

gravatar
logofiduv

The new version is leaner and requires only a handful of pure-Python dependancies.

I think scandir is not pure Python. Or is it an optional dependency?

gravatar
Will McGugan

Good point. It is part of the standard library in Python3. But requires an external library for Python2. I'll look in to making that optional.

gravatar
Simone

Hi Will! I'm looking into adopting PyFilesystem for an upcoming job: I need to generate some files (via templates) and export them via FTP. Older versions of PyFilesystem had expose functionality (via FTP too, if I understood correctly), while current version seems not. There's something I'm missing here ?

BTW, Many thanks for your great job!

Simone

gravatar
Will McGugan

Hi Simone,

Yeah, fs2 doesn't support expose yet. If you need that you might want to stick with fs1 for the moment...

Will

gravatar
Simone

OK, got it!

Thanks S.

gravatar
Bill Gaylord

It does not seem like it ever will unless the community using it makes the expose stuff. :( As they want to make expose a separate module but it has not been worked on at all yet. Just saying what I have found out... :(

gravatar
Gilles Lenfant

Hi,

The Django storage support is a great stuff in a project I'm working on. It seems to be dropped in this new release. Is it a pending feature, or does it require an add-on, or should I freeze fs to the latest version pre-2.0 ?

Thanks for PyFilesystem.

gravatar
Will McGugan

Hi Gilles,

The Django Storage would probably be an external project. You might want to pin to the current version until that is done. It will be a while before there are as many supported filesystems for fs2.

Will

gravatar
Jonathan MacCarthy

I also think PyFilesystem looks quite useful; thank you putting it out there. I have a question: it looks like PyFilesystem has its own unique set of methods compared to Python's own osand os.path modules. This means that PyFilesystem is not suitable to drop into codebases that use those modules extensively without re-writing. Instead, you adapted the native os into PyFilesystem's interface. What was your reasoning for this?

Thank you, again!

gravatar
Will McGugan

Hi Jonathan,

The FS api improves upon the API offered by os and os.path by making it more user friendly and a little less error prone. The FS api also has to be able to expose sources of data which tend to be quite different the the filesystem offered by the OS.

So the goals of making it user-friendly and more general have influenced the API, rather than making any attempt to be a drop in replacement.

Will

gravatar
Jonathan MacCarthy

Will,

I agree that the FS API is more user-friendly than that of the standard library. I will look into what it might take to overlay the standard API onto FS, to improve interoperability. I've got a lot of code that uses os and os.path; it'd be nice to import fs as os and see how it goes:-)

Thanks again for your great package.

Best, Jon

gravatar
Michael Delgado

Hello! We were dependent on the developer version of fs1 (we use conn_kwargs in s3fs) . Now that fs2 has been deployed on pip as fs==2.0, our setuptools installs are breaking. Any chance you could deploy the current fs1 version on github as 0.5.5?

gravatar
Will McGugan

Hi Michael. I'll do that in the next few days...

gravatar
Bill Gaylord

Are ways to expose the FSs planed for PyFS2?

gravatar
Will McGugan

Yes. It would likely be in a separate project though. So you would pip install fs.expose.

gravatar
Bill Gaylord

Any news on pyfilesystem2? Considering pyfilesystem (1) is still the only one that allows actually using the filesystems outside python.

gravatar
Will McGugan

PyFilesystem2 has been released. Current version is 2.0.2.

gravatar
Bill Gaylord

I know it's released I was mostly asking if any work has been done on fs.expose to make it useable for stuff other then python.

gravatar
FabienP

Hi, I just found out PyFilesystem and I feel that I'm gonna use it (and enjoy) very soon!

Just to mention, from python3.4 and the new pathlib, the example you gave one the PyPi page for counting python code lines using the standard libs is not accurate anymore. You can actually do:

from pathlib import Path

def count_python_loc(path):
    for file in path.glob('**/*.py'):
        with open(file, 'r') as python_file:
            count += sum(1 for line in python_file if line.strip())
    return count

path_to_projects = Path('~/projects')
def_count_loc(path_to_projects)

Which is much more like what you describe using fs, but of course your example remains true for python2 as it does not support pathlib as a standard lib.

Perhaps adding an exclude_dirs argument to your count_python_loc example, and the whole copy_fs example would better describe the interest of fs.

Anyway, great job here!