A blog about my mess
It appears that since I spend (way too much) time writing code, I apparently should be writing, even if I'm apparently not sure that this will interest people. So, let's go down this path again fighting my procrastination and my perfectionism daemons while sharing what I'm doing and what I'm discovering.
Let's start with why, how and what I'm doing on (one of) my (too many) last projects.
The Baron project goal is to a create an AST for the python programming language that guarantees a lossless conversion between the source code and the AST.
Not clear? Let's start with definitions.
AST stands for "abstract syntax tree", it is an abstract representation of what some source code file means from the compiler/interpreter point of view.
In general, when you execute (or compile) some source code, for e.g by doing "python my_script.py", the interpreter/compiler will parse the source file, transform it into an AST, then transform this AST into something he understand (bytecode for example). (The reality is more complicated with a lot more steps).
While this is the most frequent use of an AST, this is not the only one: it can be use for everything related to code analysis, modification, creating tools, modifying the inner representation of code before sending it to the interpreter (some libs do that, for e.g. py.test does this with the asserts) etc... Those are the case that are interesting me.
So Baron is going to offer you an AST for the python language where the
operation: source code → Baron's AST → source code will give an identical source code.
Yes, python standard lib allows you very easily to play with python's AST which is very cool. The problems are that:
isinstance everywhere) but none of them are very cool or pleasant to use (I've already done that several times)There are several other existing tools, but none of them guaranteed a lossless conversion. The closest one is this tokenize lib that is very close to a lossless conversion but which is not made for it (and I was too lazy to hack it for that, and it's "only" the tokenizing part).
This question is already partially answered, but let's hit the nail one last time. So, having an AST with a lossless conversion with the source code will allow, among other things:
python ←→ javascript tool but the current available tools don't really permit that in a lossless fashion)Also, a fun consequence: once you have this high level AST, if you add a conversion between this AST and another format, you'll be able to edit you python code in this format and convert it back to its original (or modified) source code. For example: if you transform this AST into xml, you'll be able to use BeautifulSoup or lxml or xpath or xslt on your python source code. I'm not sure if this will useful but I like this idea.
My current strategy can be summarize by one quote from this very interesting article:
The details of this paper aren't quite as important as the general concept: a compiler is nothing more than a series of transformations of the internal representation of a program. The authors promote using dozens or hundreds of compiler passes, each being as simple as possible. Don't combine transformations; keep them separate.
The plan:
You can find the current version of the code here.
Next post will probably be on how I've written the splitting part.
Thanks for reading and have a nice day,
During the Software Freedom Day 2012 at the hackerspace brussels I've presented 2 tools that I'm working on:
Thanks to the organizers, this was a very cool event, I've meet cool people, learned about biology hacking, sozi and now really want ot see a genetic hacking lab opening itself in Brussels.
I've made a talk with Kirsten at the Freedom Not Fear 2012 that could be summarize by "Hacktivism-101", she was presenting how the EU institutions are working and how and when you can act and I was presenting the commons tools available that we are using.
Here are my slides:
Thanks to the organizers, this was a very cool event, I've meet cool people and learn things.
I use virtualenv all the time, for every new project, for every lib or project I test. I also use ipython and ipdb all the time (and django nearly all the time).
Problem: ipython, ipdb and django aren't shipped by virtualenv and installing them took some time and this break my flow when I need them.
Luckily, mitsuhiko has release virtualenv-tools that allow you to update the path of a copied virtualenv.
Some small shell scripting later and here is the result:
mkve ()
{
[ -e "$(which pip)" ] || (echo "installing pip"; sudo aptitude install python-pip)
[ -e "$(which virtualenv)" ] || (echo "installing virtualenv"; sudo pip install virtualenv)
[ -e "$(which virtualenv-tools)" ] || (echo "installing virtualenv-tools"; sudo pip install virtualenv-tools)
if [ ! -e ~/.myvirtualenv/ ]
then
virtualenv --distribute ~/.myvirtualenv/
~/.myvirtualenv/bin/pip install ipdb django django_pdb django_extensions django_debug_toolbar
fi
cp -r ~/.myvirtualenv/ ve
virtualenv-tools --update-path $(pwd)/ve ve
source ve/bin/activate
}
Just type mkve, it will check that you have everything needed installed then build the virtualenv if it isn't there, install the packages in it then copy this virtualenv to your current directory, update its paths and activate it.
It's even faster than to build a new virtualenv (once the global virtualenv is created).
I often want to share small files/documents/others with friends.
Currents solutions kinda sucks for this, we have:
Normally, to share a common git repository with all the files would be a very stupid and inefficient idea. Luckily, someone out there has created a wonderful tool call git annex.
Git annex is an extension for git that which purpose is to manage "media-like" files. The big concept is that instead of storing the said files in git, you store references to those files (symlinks) and git annex handle those files for you. This way, the size of your repository isn't going to totally exploded over time.
A common usage scenario looks like this (without the details):
So, the idea is to have one common central repository where everyone can push and that contains all the references. This way, everyone knows what is available, the sever doesn't have to store that much, you don't have to store everything and when you want to get a file, git annex will pull it directly from your friend's place.
Or, you can simply removed the central component and just share git repositories between your friends.
Advantages:
Disadvantages:
It's not very hard to imagine expending this concept because we always need an excuse to code and learn new things.
I don't think this would be very hard to build a series of scripts to eases the usage of this tool, libs for git already exist in several languages.
Things that can be done:
Remark: the "git annex sync" part might be problematic for permissions reasons.