Wednesday 21 August 2013

Weeks 8-10


Progress has been slower than I expected these past few weeks. I have been working on the integration of the architecture and the layer design that was prototyped in the sandbox application into the public_rest code. The result wasn't as neat as I had hoped, and there were quite a few considerations that had to be taken in the context of the MM-Core API.

The primary problem during this duration has been the "transformation" of data. The way I have gone about it, which, admittedly can be improved a lot, is to use the former mailman.client classes as "data adaptors" which are objects that wrap around the data returned from core. These adaptors are used by the CoreInterface to perform the necessary CRUD operations on the REST API, and are associated with the local models as well.

I was then able to create some generic interface functions that use dispatch functions for each model, and return adaptors.

Certain models like Memberships are actually exposed as multiple endpoints at the REST level. A Membership at `mm-rest` is an object with either a moderator, member or owner as "role". But at Core, we have separate endpoints for all three of them. So those three are combined at the Interface (arguably, this job should be of the data adaptor). So while we query the memberships, we create our adaptor objects using all 3 endpoints and return the Adaptor list.

During the working of the prototype, I was using a uniform REST API that was exposed using the Django-Rest-Framework, which made things easy. This wasn't the case in Core, since the REST API is not as consistent with its terminology and access to primary and secondary resources.

Some entities are models unto themselves at the mm-rest layer, but are considered secondary resources at the API. Case in point: List Settings. Each MailingList model has a foreign key that is associated with a ListSettings model. The model follows the same set of operations when it comes to performing the CRUD functions at the mm-rest layer, but, at core, it is exposed as part of the list settings, via `/lists/{fqdn_listname/config` endpoint.

To handle such inconsistencies, I pass around an "object_type" parameter, which creates resources based on the type of the object. It is not an elegant solution, but it works for my purpose and I am keeping it for now.

There have also been problems regarding some things which the Core simply does not allow me to do (like creating email addresses), so those are on hold for now, and will be (possibly) addressed after GSoC is over.

For now, I am able to handle saves for primary objects like Domain, MailingLists, and Memberships in both local and core. The job for this week is to finish the other entities and work on "filter" (and "get" as an extension) as well. Once that is complete, I think implementing User-authentication and a consistent design for our own DRF will be the only things to be done in this project.

One month left in GSoC. Time is passing quickly!

Monday 5 August 2013

Weeks 6-7


Starting off with the good news that I have successfully passed the mid-term evaluation for GSoC! :)

Onwards, as mentioned in the previous post, me and Richard have been working on a "sandbox" application, that works on simulating the communication interface between all the components in the system. That has been our primary focus for the past few weeks and it is slowly evolving into a good "generic" interface, which will hopefully be usable directly in our mm-rest application.

Data Sync and Layered Architecture

The problem of data sync has been our primary concern and focus during this time. Thus far, we have designed and implemented operations for the basic CRUD (Create, Read, Update, Delete) facilities in the application, for both a Locally and Remotely backed up object.


The architecture of the system is layered, where each layer can (and most likely will) be running at different machines with its own separate database, but might also be a local layer, with the data being backed up in the same database. So we had to provide provisions for both of these cases.

It works something like this:

               Inner Layer   <------------- Middle Layer <------------ Outer Layer

Although we can put on as many layers as we like, for each layer, we get a LOT of overhead, in terms of database operations and HTTP requests.

The interface of this architecture has been made a lot more generic, and I hope to fix a few hacks that I have to make it behave like a seamless library for any locally and remotely backed models.

This layering, although present in the system from the beginning, has been made a little more explicit, with each layer having a relation with the one below it. For each model, there exists a mirror copy at each layer (all of the 3 layers can have some different fields, but each adjacent layer should have at least one common field which can be used to query).


For the locally backed objects, that field is a OneToOne or ForeignKey relation. For the remotely backed objects, its a partial URL unique to the related object, which is generated and creates the relation when the object is created.

The problem at this point is that due to the machinery present in Django and DRF, we are currently making a lot more save operations, and even more problematic, more HTTP requests.

Apart from that, there are use cases which haven't been looked at properly yet, for example, the case where we might have to propagate immediate updates to the upper layers. For now, this sync will only behave as a *cache* which is updated if the current layer has nothing in it, or if there is some other criteria being satisfied (object expired because it's too old, periodic cache update etc).

Diving into the ORM

I spent a lot of the past couple of weeks fighting against the Django REST Framework and how it the behavior I desired was not working because of something the DRF handles internally. The major problem in that regard was filtering objects and getting ALL objects at once.

One the surface, the two look like separate behavior functions that should be easy to customize/extend. It was not quite as easy, since overriding all() was proving to be very difficult and causing my tests to fail. This happened because internally, DRF made a call to the all() function for every data serialization that happened.

So instead of hacking all(), we had to go around and change the DRF Viewsets and the DRF filtering backend itself, which is another library known as django-filter, to use querysets directly and not call all() unless they absolutely *have* to.

This made things easy, as now, we could just define all() in terms of a filter() that filters nothing. :)

During the course of struggling against all(), I learned quite a few things about the Django ORM, how the QuerySets work, how to override them, and how to easily make them work with the Managers. In order to maintain the DRY principal, I also searched around for a hack (there is nothing explicit in the ORM regarding this), and ended up using django-model-utils, an awesome little library that defines a few convenient model utilities for Django.

One of them, PassThroughManager, made it easy to define everything in the QuerySets itself, and took care of defining things in the Manager itself.

Other than that, I spent the time writing tests, finishing off the design documentation for our basic operations and did the Rocky movie marathon on Sunday! :)

Wednesday 24 July 2013

Week 5

Adventures in Git and Bazaar

As the mid-term evaluation for GSoC draws nearer, I have to submit all the work I've done so far for review to Mailman Mentors. The GNU Mailman project uses Launchpad to host all the code. This made things quite difficult, since the work I've been doing so far has been in Git, hosted on my mentor Richard's server, while Launchpad uses the Bazaar version control system.

In order to upload the code that we've been working on in Git on Launchpad, we used a plugin git-remote-bzr. Using this plugin, we were able to interact with Launchpad via git, by making an individual bzr branch into a git "remote". Then, you are able to fetch/push to that remote just like any other. 

It sounds quite simple, but it took pain and suffering to actually work it all out in a decent way without messing everything up. For one, git-remote-bzr would only work with the master branch (and no other).  So after trying the git syntax for updating a remote branch with a different local branch (among other things), I finally ended up just merging master with the branch I actually wanted to upload (which was fortunately simple in this case) and then pushing out to master.

Another trouble that I encountered was that at one point in Postorius' commit history, the commits diverged. The regular branch which I "forked" on launchpad was ahead of the common point where the branches diverged by a few commits.

So I had to delete that and make a new branch (specific to that common point), and then ensure that all subsequent changes that I pushed came directly after the common point. It should have worked, but even after getting to the common commit, I was still getting the error saying I had extra commits.

In the end, Richard pointed me to a git version of `bzr missing`, which told me I had the extra commits (the commits made after the divergence point. Removing the .git/bzr directory and doing a fetch again fixed that and I was able to push!

Keeping up with the schedule

You learn in Software Engineering class about how projects have requirements that change frequently, how things don't always go the way you planned them out, and almost always, you will end up doing things differently than you envisioned. That has been my experience so far in this project. I would say this has been a very important lesson in real world development.

Starting out, I had a plan about what I thought would be the way the summer would go and how I would have to work on the project. I had a timeline, which was a rough sketch of what I expected to happen, and how I expected to achieve it.  Suffice to say, that was not at all how things worked out once I started doing the actual work. 

Transitioning from my schedule to the new one was easy enough. It was not like the milestones I had were set in stones, and I quickly realized that the way to achieve the goals was to work on the new schedule that we were following. The way to do that is to just have enough "wiggle room" and don't set very hard expectations about what might/might not happen. You will never know what problems you might be faced with, how long it might take to tackle them, and how that will affect your plans. 

Sandbox Sandwich

This week, instead of focusing on the actual repository, Richard and I worked on a separate "sandbox" repository, which simulated how the interactions between the 3 major components (Mailman Core, REST Interface, API Clients) are handled. This was done using 3 different Django apps which handle models and use an Interface to communicate with the other layer. Not all that exciting, but it has been helping me understand the way the system should work a lot better than before. :)

Tuesday 16 July 2013

Week 3-4

Past few days haven't been very good. Having a lot of slow days where I have spent the majority of time sick, although I did manage to get some work done.

The DRF (Django REST Framework) integration has started, and I created Serializers  and ViewSets for a few "First Class" models, which the DRF uses to expose a REST API.

Serialization is the process which translates data that can be stored (in this case, coming from the SQLite database via Django-ORM) into a format which is easy to transfer across the network (JSON/XML). De-serialization is the opposite where we re-construct the database-friendly format from the JSON/XML. DRF does this by providing Serializer classes, with some built-in classes for easy serialization of Django Models. Integrating them wasn't difficult.

The Django REST Framework comes with several built-in Serializers which integrate very well with the Django models. This works out very well, because we get certain things (like Pagination) for "free" without actually writing code for handling that particular feature. The resulting API also maps the Relationships of the models with each other well. It also allows for many extensible features in case we want to customize anything instead of using the default (which will be the case in this project, I suspect).

The data being exposed right now via DRF is in a format that it generates by default. In the future, that would change to providing a customized "scheme" which we will use in the data format (something like HAL for the JSON format).

In short, this current version of the API gives us a (somewhat blurry) picture of what we can expect when the project is finished.

After making a rough preview of the API, I have started working on what will be the hardest part of the project so far, which is to have the database of the API models (the ones that I have been working on) communicate with the Mailman Core database, and do read/write/update/delete operations on both of them simultaneously.

I had a discussion with my mentor about the role that the new models will play in regards to the Core, and he said to treat my models and the Core as Peers, instead of making the assumption that Core will be the "main" database. This gives us the advantage that we don't have to think about everything in terms of how it interacts with -Core, but can be independent of it, sharing only the data it requires.

To interact with the Core's API, I practically used the same code that was removed from mailman.client, with almost little to no modification as of now.

The models that are at the Core and at the REST differ in their structure and their relationships, which is why it won't be simply as easy as to drop-in a generic function that updates everything at the Peer model whenever something is saved in the Django models.

So far, I have managed to make it work for a single User model's write operations, so creating a new User from my models also creates a corresponding User at the Mailman Core.

Doing this wasn't quite easy. I encountered a few roadblocks. At first, I was trying to do the creation/updation at peer via Django signals like "post_save" and do any API-related stuff when that signal was triggered, but that was getting more complicated than I expected. The next day, I figured out a (admittedly hacky way) to make the create/update of User directly inside the Model's .save() function.

So right now, I'm in the middle of doing the task of communicating with -Core for other models, which will occupy the upcoming week. Hopefully, it won't be as painful as I'm thinking it is going to be. :)

Tuesday 9 July 2013

Week 2+

With the completion of the models in week 2, the aim was to create a "Simulator" for Postorius, which as the name implies, was to simulate the functionality using new models that I have been working on.

Week 2 was spent working on simultaneously updating the Models and any integrated changes that I had to do in Postorius, as well as removing any and all traces of the older mailman.client module, which was being used as a way for Postorius to communicate with the Internal Core API.

The purpose of removing mailman.client is to showcase that pretty much every core functionality that the core has exposed is now being redirected via the new REST models that I have been working on, without anything going  directly through mailman.client. This does not mean that the client code will become redundant or removed completely.  Part of mailman.client code will still be used a way to query the Core API to get results, which will just go through the new code, where the results will get cached in the local database, to make lookup faster, instead of directly going to Postorius.

Most of the time was spent on working out the branching model for the work that we were doing. This was the first time that I extensively used Git branches in order to separate out the various features of the project. It wasn't very obvious or easy, and it took a while for me to figure out everything. My mentor, Richard has been extremely patient with me and helped me understand how everything worked.

We created separate branches for making any changes in the Postorius code related to the models/views etc, and one for removing traces of mailman.client. Then, after all those changes were complete, I merged them into a new branch called m-simulator.

I also created a Trello board in order to keep track of the progress and any issues in the project. Since we aren't using Github/bitbucket etc, this seemed like the best way to do it.

The plan for the upcoming week is to start integrating Django Rest Framework, in order to get a "preview" of the REST API we might get at the end of the project. Once that is done, we will work on a way for the new models to communicate with the core API, a job that was previously being done by mailman.client.

Lastly, the title of this blog is Week 2+ because I got sick at the end of the week and couldn't finish either this post, or continue the work for the project. So a couple of days were wasted there, and this post is being published later than it should have been.

That's it for now. Until next week.

Thursday 27 June 2013

Week 1

For my first week of the GSoC coding period, I focused on designing the models of my application. A Model, as it is known in Django terminology is the Schema. Designing the schema is an important first step towards building the API. The Mailman core exposes information about the Lists, Domains, Users, List Members etc, but these things are not properly related to each other well enough. So I went on about making connections between the various entities in the system using these models.

Basic models for Domain, MailingList, List Settings, and something called a "Subscriber" though we won't be sticking to that name for long, are good enough for now. I also managed to hook a few of these new models into the existing Postorius application. Not surprisingly, had to change a few things up a little to make them work properly, but I want to make as few changes as possible in the Postorius codebase.

I started writing tests for these models as well! That was my favorite part of the entire week. Writing tests is a lot of fun! There's a sense of satisfaction that you get when you see all those tests passing without a hitch. I loved it, and am finally beginning to understand why some programmers are obsessed with TDD (Test Driven Development).

Apart from writing tests, something else that I've started to look into is Schema migrations in Django using South. As my schema changed, I always had to test the new one out, and re-creating and syncing the database manually is one of those annoying things that I shouldn't have wasted much time on, as I now realize. South takes care of everything by allowing a Django database to be migrated from one state to another as your models are updated. You can also move forward and backwards from one state to another easily, allowing you to keep track of all the changes in the schema. It is somewhat anologous to Git, I think.

Speaking of Git, another thing that I learned (or re-learned, as I had known it before but just never got in the habit of using it that often) is git's patch mode! This is an excellent tool to selectively add and commit things inside a large diff, so that all your commits have a single focus and no unrelated changes go inside that commit. This is done to ensure that we have a proper record of changes that we may later want to look at, or ease the process of reverting git history, which will be quite confusing if reverting one commit not only reverses the change that you want, but also the unrelated changes that were recorded in that commit. Anyway, patch mode is quite awesome, and you can find a great explanation (with videos!) by John Kary here.

If I'm being honest with myself, I must admit that this week turned out to be slow going. A couple of times, I got bogged down into the details and failed to see the big picture, which led me to waste time on bugs which could have easily been resolved. I should also have been writing tests much earlier than I actually did, which led to me wasting some time testing things out manually in the Django shell, every time I made changes in the schema.

Hopefully, I will be much more productive in the upcoming weeks.

Now some non-code news. I've started working out of the new office of the company that I interned for last  year, SD. SD's new office is an awesome place to focus solely on the work without the distractions of working out from home. :)

And finally, my welcome package arrived, which, apart from the Visa cash card, also contained some goodies like a Diary, a Pen, and an awesome Sticker. Thanks, Google!

That's it for now. Until next week.



Tuesday 18 June 2013

The adventure begins!

Yesterday was my last exam. As of today, I no longer have to worry about any upcoming exam or any other college-related thing like assignment, and can completely focus on GSoC. :-)

This past week, my mentor, Richard has been helping me get completely set up with the new layout for the project. The setup included an initializing script, "bootstrap.sh", which wasn't quite doing what we expected it to do, (partially due to my derping and using zsh instead of bash), and we spent quite a lot of time debugging.

Anyway, for the upcoming weeks, I'll be focusing the applications's Models. So I spent the day reading the documentation for Django models, managers, and the code for how the custom User model works in  django.contrib.auth.models. We will be working towards making extensible models, whose functionality can later be improved by plugging in "mixins".  Mixins are a topic that I haven't studied deeply. I think I wrote a few mixins as part of a Coursera class last year in Ruby, but that was quite a while ago. So I'll be spending some time studying the Mix-in pattern, specifically for Python, and work out how I can use that in my models.

I will then look into DRF (Django Rest Framework) and work on exposing these models via a REST interface.

That's it for now. Until next time!