PyFilesystem 2.0 Released
I'd like to announce version 2.0.0 of PyFilesystem, which is now available on PyPi.
PyFilesystem is a Python module I started some time in 2008, and since then it has been very much a part of my personal standard library. I've used it in personal and professional projects, as have many other developers and organisations.
Recap
If you aren't familiar with PyFilesystem; it's an abstraction layer for filesystems. Essentially anything with files and directories (hard-drive, zip file, ftp server, network filesystems etc.) may be wrapped with a common interface. With it, you can write code that is agnostic as to where the files are physically located.
Here's a quick example that recursively counts the lines of code in a directory:
def count_python_loc(fs):
"""Count non-blank lines of Python code."""
count = 0
for path in fs.walk.files(filter=['*.py']):
with fs.open(path) as python_file:
count += sum(1 for line in python_file if line.strip())
return count
from fs import open_fs
projects_fs = open_fs('~/projects')
print(count_python_loc(projects_fs))
The fs
argument to count_python_loc
is an FS object, which encapsulates everything you would need to do with a filesystem. Because of this abstraction, the same code will work with any filesystem. For instance, counting the lines of code in a zip file is a single line change:
projects_fs = open_fs('zip://projects.zip')
See my previous posts on PyFilesystem for more back-story.
Why the update?
A lot has happened since 2008. Python 3 happened. The IO library happened. Scandir happened. And while PyFilesystem has kept up with those developments, the code has suffered from the weight of small incremental changes. Not to the degree that it required a re-write perhaps, but a new version gave me the opportunity to make improvements to the API which couldn't be done without breaking a lot of code.
The re-write was guided by 8 years of observations regarding what developers wanted from the library; what worked well, what felt awkward, and the numerous edge cases in creating the illusion that all filesystems work alike (they really don't). A lot of agonizing has gone in to the design of the new API to make it simple to use without sacrificing functionality. This often required breaking up large methods in to more atomic methods that do one thing and do it well.
Another motivation for this version was to make it easier to implement new filesystems. It turns out that all you need to implement any filesystem is to write 7 methods. Which I find somewhat remarkable. The new API has been designed to make it much easier to develop your own filesystems in general, with more of the heavy lifting been done by the base class. I'm hoping this will encourage more developers to release new filesystems, and for pre-2.0.0 filesystems to be ported (which I'll gladly assist with).
So what is new?
Cleaner code
The new project is a unified Python 2 and 3 code-base. It's legacy free code with 100% coverage.
I wanted it to be bullet-proof because PyFilesystem an integral part of Moya. Moya's static server uses PyFilesystem to serve assets. Here's a screenshot of that in action:
Moving and copying got simpler.
Here's how you compress your projects directory as a zip file:
from fs.copy import copy_fs
copy_fs("~/projects", "zip://~/projects.zip")
This works because the fs.copy
and fs.move
modules accept both FS objects and FS URLs.
Simple file information
File information has been categorised under a few namespaces, so you can request only the information you are interested in (potentially avoiding needless system calls). Here's an example:
>>> from fs import open_fs
>>> my_fs = open_fs('.')
>>> info = my_fs.getinfo('setup.py', namespaces=['details', 'access'])
>>> info.name
'setup.py'
>>> info.is_dir
False
>>> info.user
'will'
>>> info.permissions
Permissions(user='rw', group='r', other='r')
>>> info.modified
datetime.datetime(2016, 11, 27, 0, 17, 29, tzinfo=<UTC>)
Directory scanning is more powerful
The original PyFilesystem had a design flaw that we were unfortunately stuck with; when you list a directory you couldn't know which paths were files and which were directories in a single call. A workaround was to make a call to retrieved just the directories, and a call which retrieved the files. Making two calls to retrieve the directory listing was inefficient for network filesystems. Another workaround used stat information, but that only worked for the OS filesystem.
In fs 2.0, the directory listing methods return Info objects which have an is_dir
flag, so no need for any workarounds. There is also a page
attribute which allows you to paginate large directories (handy if you have millions of files).
Directory walking has a simpler interface.
To compliment the directory scanning, there is a new directory walking class which supports filtering by wildcard pattern. Having an external object to do the directory walking allows for more flexibility. For instance, the copy_fs
function accepts an optional walker
parameter which can be used to specify which files should be copied.
Here's how you would use a Walker object to compress only .py
files, while ignoring Git directories:
from fs.copy import copy_fs
from fs.walk import Walker
py_walker = Walker(filter=['*.py'], exclude_dirs=['*.git'])
copy_fs("~/projects", "zip://~/projects.zip", py_walker)
Closer to the metal
Not all filesystems are created equal; a zip file is very different from your HD, and completely different from a filesystem in the cloud. Designing a single API to makes them interchangeable is challenging. The original PyFilesystem API took the approach of supporting a common denominator of features, and simply didn't support things that weren't more or less universal (such as permissions). The new version has a different philosophy and will attempt to expose as much as possible.
Serializable API
The new API is designed to be very easy to serialize, making it easier to implement network filesystems with JSONRPC, XML, REST, etc.
In the early days I considered such things to be a novelty, but it's proven to be something that developers often want. The new API should make that less of a challenge.
Should you upgrade?
There is a lot of functionality packed in to the original PyFilesystem, such as exposing filesystems over HTTP and other network protocols, OS integration (FUSE and DOKEN), command line apps, etc. All very cool stuff, but it did make for a lengthy list of requirements. The new version is leaner and requires only a handful of pure-Python dependancies. The other cool stuff will be distributed as additional namespace packages.
If you require any of those features, you may want to stick with the pre 2.0 version until I've ported everything. Ditto if you need S3 support or one of the third-party implementations.
Otherwise, I think you will find fs 2.0.0
a solid additional to your standard library.
How to install
You can install fs
with pip as follows:
pip install fs
Add the -U
switch if you want to upgrade from a previous version.
Alternatively you can checkout PyFilesystem2 on GitHub.
Feedback?
Not is the best time to suggest changes and new features. Comments and PRs most welcome!
Hi, PyFilesystem looks fantastic! Why it is not available through Anaconda (conda install)?
Thanks!
I couldn't say. Never used Anaconda...
Arthur, I use Anaconda and PyFilesystem installed fine via pip into my Anaconda-specific site-packages directory (although I had a dependency on Visual C++ 9.0 because I installed from Win10).
Thank you Dave for the info, How did you do that? Besides, me too, I needed the VC ++ 9, I am on Win 10, too
Arthur, I was able to download VC++ 9 for Win10 at https://www.visualstudio.com/post-download-vs/?sku=vspython# (well this is the "success" page but you can click the "retry" link to restart the download). After installing VC++, pip was all I needed. If you have other issues, let me know and I'll help if I can.
Hmmm,This is my way.[sorry,poor English] first,download
scandir
.[download here:http://www.lfd.uci.edu/~gohlke/pythonlibs
] second,pip install scandir-1.4-cp35-cp35m-win_amd64.whl
finally,pip install fs
. why i cant use visual cpp build tools or another cpp build tools....because i hate it and i tired= = always fail...never success!!! i hopehttp://www.lfd.uci.edu/~gohlke/pythonlibs
can help you:DThanks! I'll direct Windows users there. Installing in Windows is a pain in general.
Will,
I discovered PyFilesystem2 yesterday and will be putting it through the paces today. It looks really nice so thank you very much for your continued contributions to the community.
I also see that Dropbox support is mentioned in the issues list and that several options existed for PyFilesystem(1). DB support would be fantastic.
Thanks and Best Regards, Dave
Great. Let me know if you have any questions!
Thanks - I just emailed you a couple documentation-related suggestions that occurred to me when reading the official documentation and installing fs on Win10.
Hi, thank you for your work.
I was wondering if there's any interest in a mountable (read-only) HTTP filesystem, one for [filename, url]-pairs and one for HTTPApache dir listings. some questions:
1)would these FS's replace the current HTTPFS or would it go to contribute? Or does that depend on which license I 'd release it with?
2)what modules am I allowed to use? Is Beautiful Soup acceptable for HTTPApacheFS or am I limited to regex? Can I use extra threads for the http-requests?
3)how much am I allowed to use ad-hoc methods to optimize for real-world http? Eg now my fs makes a [fixed blocksize] http request for reading, while a dynamic one would be better for mounted files (os or vlc reading metadata and chapter information throughout a video-file, requiring lots of seeking and favoring a small blocksize , but when sequentially reading a file, doing too much http requests would decrease speed by a lot.)
4)any reason why archivefs and sqlitefs are not mentioned in /PyFilesystem/pyfilesystem/blob/master/docs/contrib.rst ? I didn't know about their existence!
Hi Phil,
Definitely for the HTTPFS. Not sure what you mean by the "[filename, url]" pairs. Is that to mount individual URLs on a file path? There was functionality for that in the original version, but I abandoned it, because I didn't believe there was a use case for it. I might reconsider if you have a use case for it. For the apache dir lists, I think that might not be generic enough to go in to the core library. Maybe best left as an external project.
I've abandoned the
contrib
module. Basically I want to be able to maintain everything in the core library.The HTTPFS is worthwhile adding to fs2. It shouldn't be hard to port. It would be great if you wanted to contribute that.
No to beautiful soup I'm afraid. I think its a bit too heavy weight in terms of dependancies. Threads are fine, as is pretty much everything in the stdlib.
Pretty much anything goes as long as it works according to documentation, and it passes the tests. You could use the getmeta and setmeta methods to tweak behaviour without adding additional methods.
Those are external liba, and
contrib.rst
lists the filesystems in thefs.contrib
module. I would like to make it easier to discover implementations. I'm looking in to hosting a wiki on pyfilesystem.org.I think scandir is not pure Python. Or is it an optional dependency?
Good point. It is part of the standard library in Python3. But requires an external library for Python2. I'll look in to making that optional.
Hi Will! I'm looking into adopting PyFilesystem for an upcoming job: I need to generate some files (via templates) and export them via FTP. Older versions of PyFilesystem had expose functionality (via FTP too, if I understood correctly), while current version seems not. There's something I'm missing here ?
BTW, Many thanks for your great job!
Simone
Hi Simone,
Yeah, fs2 doesn't support expose yet. If you need that you might want to stick with fs1 for the moment...
Will
OK, got it!
Thanks S.
It does not seem like it ever will unless the community using it makes the expose stuff. :( As they want to make expose a separate module but it has not been worked on at all yet. Just saying what I have found out... :(
Hi,
The Django storage support is a great stuff in a project I'm working on. It seems to be dropped in this new release. Is it a pending feature, or does it require an add-on, or should I freeze fs to the latest version pre-2.0 ?
Thanks for PyFilesystem.
Hi Gilles,
The Django Storage would probably be an external project. You might want to pin to the current version until that is done. It will be a while before there are as many supported filesystems for fs2.
Will
I also think PyFilesystem looks quite useful; thank you putting it out there. I have a question: it looks like PyFilesystem has its own unique set of methods compared to Python's own
os
andos.path
modules. This means that PyFilesystem is not suitable to drop into codebases that use those modules extensively without re-writing. Instead, you adapted the nativeos
into PyFilesystem's interface. What was your reasoning for this?Thank you, again!
Hi Jonathan,
The FS api improves upon the API offered by
os
andos.path
by making it more user friendly and a little less error prone. The FS api also has to be able to expose sources of data which tend to be quite different the the filesystem offered by the OS.So the goals of making it user-friendly and more general have influenced the API, rather than making any attempt to be a drop in replacement.
Will
Will,
I agree that the FS API is more user-friendly than that of the standard library. I will look into what it might take to overlay the standard API onto FS, to improve interoperability. I've got a lot of code that uses
os
andos.path
; it'd be nice toimport fs as os
and see how it goes:-)Thanks again for your great package.
Best, Jon
Hello! We were dependent on the developer version of fs1 (we use
conn_kwargs
ins3fs
) . Now that fs2 has been deployed on pip as fs==2.0, our setuptools installs are breaking. Any chance you could deploy the current fs1 version on github as 0.5.5?Hi Michael. I'll do that in the next few days...
Are ways to expose the FSs planed for PyFS2?
Yes. It would likely be in a separate project though. So you would
pip install fs.expose
.Any news on pyfilesystem2? Considering pyfilesystem (1) is still the only one that allows actually using the filesystems outside python.
PyFilesystem2 has been released. Current version is
2.0.2
.I know it's released I was mostly asking if any work has been done on fs.expose to make it useable for stuff other then python.
Hi, I just found out
PyFilesystem
and I feel that I'm gonna use it (and enjoy) very soon!Just to mention, from python3.4 and the new
pathlib
, the example you gave one the PyPi page for counting python code lines using the standard libs is not accurate anymore. You can actually do:Which is much more like what you describe using
fs
, but of course your example remains true for python2 as it does not supportpathlib
as a standard lib.Perhaps adding an
exclude_dirs
argument to yourcount_python_loc
example, and the wholecopy_fs
example would better describe the interest offs
.Anyway, great job here!