Learning to build a web app from scratch: Part II

It’s been about six weeks since my original post, so overdue for an update. This project is still my main priority at work, and that’s probably not going to change anytime soon. Once again, the purpose of this app will be to analyze stock trades for institutional investors (hedge funds, mutual funds, pension funds). These posts track my thought process as I decide which technologies to employ and learn how to use them throughout this project.

About three weeks ago, one of our potential future customers graciously sent us six months of order data, which is amazing. The data is broken into three sets:

Parent orders (instructions from PM -> Trader)
Broker orders (instructions from Trader -> Broker)
Fills (actual executions on the street)

Since we got the data, it’s been full steam ahead trying to integrate the data into the app and slice it in various ways.

Here’s the current setup:

Framework: Django
Database: SQLite
Graphing toolkit: Highstock/Highcharts
Other JS plugins: Chosen.js (Django-chosen), Datatables
Dev environment: Sublime Text 2

Current breakdown

You can see what the app looks like so far from the above screenshot (potentially sensitive data blurred out). There are a few screens like this that allow the user to slice the data in different ways and see various relevant statistics and charts. It’s not anywhere near a finished state, but it’s at a point where I can use it to explore the data and look for potential points of interest. Here are my thoughts on the various components involved.

Django

Been very happy with Django so far. Main awesome things about it:

Querysets: All of the data backing the app can be filtered and accessed extremely easily using Django’s Queryset functionality. It’s basically magic. Any SQL query you can think of can be expressed with these super basic and intuitive calls. It’s perfect, because this app is all about pulling out different subsets of your raw data depending on the situation.
MVC: Model-view-controller is just a solid approach for constructing an app, and Django’s implementation is very straightforward (although in Django, views are called templates, and controllers are called views– I guess that’s a little confusing). Anyway, just define what your raw data models look like, and then it autogenerates an empty table in your database to house it. This table can be autopopulated from JSON or manually populated. Then in the view you write all of your logic to figure out what specific data you need for a page (as well as any processing you need to do on that data). A simple call at the end of the view passes your processed data along to the template, which is just the HTML layout of your page.
Template calls: There are these super flexible Django constructs that let’s you seamlessly integrate the data that got passed in from the view. Plus you can do various Python things using these constructs like run loops or index into arrays/dictionaries. Super useful.
Forms: Django has some nice built-in functionality for forms and in particular forms involving model data. For example, I have a bunch of select fields that let the user choose one or several orders to examine– Django seamlessly pushes the options to the template and grabs the relevant data that gets posted when the form is submitted.
Database integration: Django pretty much does all of the backend database work automatically. Just define your models and let it know what type of database you want to use, and that’s it. Or if you already have a bunch of data (as in my case), Django can look at the raw data table and just autogenerate your models. Amazing!
Code modification while the backend is running: This is kind of small, but very nice nonetheless. You can make changes to your code while the app is running and as soon as you save, the changes will be integrated into the app without needing to restart anything (I think the backend automatically restarts whenever it detects a modification to the source code).

And I’m sure there are lots of other great Django features that I haven’t stumbled upon yet. Only slight negative about Django so far is that it feels a little sluggish to me (maybe a python thing?). And I have no reference point and have not at all tried to optimize my code, so this is pretty much an entirely unvalidated claim.

Highstock/Highcharts

This graphing software is pretty awesome. It let’s you do all sorts of different interactive charts and works really nicely. Basically you pass it raw data in a simple format and then you have tons of options to control how it’s displayed. Combining massive optionality with Django’s template calls (dynamic Python) and there’s a lot of potential. For example, I was able to figure out how to have the chart display dynamic marker sizes based on the size of the fill without *too much* trouble by doing this:

            var lgtwo = Math.log(2);
            series: [{ {% for key, value in stockdata.items %}
                    name: '{{ key }}', 
                    data: [{% for dp in value %} {
                        x:{{ dp.0 }}, 
                        y:{{ dp.1 }},
                        fills:{{ dp.2 }}, 
                        marker:{radius: Math.max(1, Math.log({{ dp.2 }} / 50) / lgtwo)} 
                    {% if not forloop.last %}}, {% endif %}
                        {% endfor %}
                    }], 
                    lineWidth : 0,
                    tooltip: {
                        pointFormat: '{series.name}: {point.y}<br>fills: {point.fills}'
                    },
                    {% if not forloop.last %}}, {
                    {% endif %}{% endfor %}
                }]

This is the part that defines the series data that will be charted. stockdata is a dictionary of orders passed from my view (key=order id, value=array of fill data tuples (time, price, quantity filled)). This is Python generating JSON inside of Javascript inside of HTML. Kinda crazy. And I have the size of the markers growing on a logarithmic scale because an individual fill can be anywhere from 1 share to 1,000,000 shares (or even more, but usually 100-1000).

Note on Django-chartit

In my original post, I was planning on trying our Django-chartit, which is a basic library that integrates Highcharts into Django. I got Django-chartit working with raw Bloomberg market data very soon after my original post, but decided to scrap it a few days later and just work with Highstock/Highcharts directly. I found that Django-chartit doesn’t really add much value. Basically, all it does is move your Highcharts configuration from Javascript to Python, destroying a good chunk of potential functionality in the process. Ideally, Django-chartit would take in a Queryset and some parameters (relevant terms, chart options) via a simple call and then just make everything work. Unfortunately, it requires you to format everything into a complicated Python dictionary that’s structured identically to the Highcharts configuration (and all it does is convert it into JSON). Oh well.

Django-chosen/Chosen.js

My app involves a lot of selecting– what brokers to examine, what orders to display, what layers to include, etc. Chosen is this sick JQuery plugin that makes awesome select and multiple select fields. (Select2 looks like another related and potentially interesting option). Django-chosen integrates Chosen form fields into Django’s built-in form functionality, and it’s absolutely perfect for most of my needs. It takes a Django model, filters the objects in my database, and passes them into a Chosen field in my form in my template.

Datatables

Datatables is another JQuery plugin, this time for tables. It takes my massive unwieldy data and puts it into nice, quick, manageable tables. Like Highcharts, it has tons of options, but I haven’t messed around with it much.

Sublime Text 2

So originally I was using Pycharm for my development environment, and it was totally solid. But since then, I’ve switched to Sublime Text 2, and I’m liking it a lot. It has much less IDE functionality, especially out of the box, but it’s just such a beautiful text editor. It’s lightning fast and it has tons of useful shortcuts and cool features (like allowing the use of multiple cursors simultaneously). Plus, it is very easily extensible, so with a couple quick add-ons it’s half way to what I was getting in Pycharm (for example, you can read about this guy’s Django/ST2 setup). By far the most useful package I’ve added to Sublime Text 2 has been SublimeLinter. It adds the equivalent of squiggly lines to your code when there’s a syntax error, and without it I don’t think I’d be able to program in ST2. The only thing that’s really missing is smarter code completion. ST2 has built-in code completion, but it’s based mostly on the variables and method defined in your current file. It doesn’t pull any information from your imported libraries, which would be nice. I added the Djaneiro package which incorporated a decent amount of Django autocompletion stuff, and packages like SublimeCodeIntel or SublimeRope might finish the job, but so far it hasn’t been enough of a problem for me to spend tons of time trying out various options. I’ll probably revisit these last two at some point in the future. The other major feature that’s lacking is a debugger and the ability to step through your code, but I don’t miss these too much.

Major obstacles to this point:

Other than just learning the basics of Python, Django, SQL, Javascript, HTML, etc and figuring out how to make them talk to eachother, one of the more annoying aspects has been dealing with timestamps. This project is particularly sensitive to timestamps as we’re looking to examine the stock market on a subsecond level. Also, I’m located in EST, and the stock market is open during EST hours, but market data is stored in GMT time. Django by default takes GMT timestamps from the SQLite database and converts them into local Python datetimes, and then I need to pass them to the Javascript Highcharts object in milliseconds since epoch. Python has about a bajillion different ways to play around with datetimes, but unfortunately none of them allow you to convert to milliseconds particularly easily while maintaining timezone. Ultimately, the best way I found to make these timestamps Javascript friendly without losing too much resolution was to write this basic function which converts times into second since epoch, multiplies by 1000 and then adds the microsecond component divided by 1000:

def ms_since_epoch(dt):
    return 1000 * float(calendar.timegm(dt.timetuple())) + dt.microsecond / 1000

It’s not too complex, but getting here was much harder than it should have been. I wish there was an easy Django way to just convert a SQL timestamp straight into Javascript ms since epoch.

The second challenge I’ve found was that the app is sometimes really slow, especially if I’m performing lots of complicated queries and processing the resulting data pieces one by one. This probably shouldn’t have been too surprising since there are hundreds of thousands of objects. One method I’ve found helpful has been creating supplemental models to house pre-processed data. That is, I run a single batch analysis on every single object and put the results into a separate table (where each object has a 1-to-1 correspondence with the original objects). Then, I can just do a single query and reference the processed values right away. This has made a massive difference in some cases for me, but obviously wouldn’t make sense in every scenario, like if you were performing analyses that involved multiple user inputs, and the number of possible combinations was enormous.

Next steps:

The main thing that will be happening soon is integrating market data (right now it’s just execution reports from this one firm, but it’s tough to analyze the quality of an execution without knowing what was happening in the market as a whole). We’ll be buying some raw market data from Tickdata later this week (Bloomberg is no longer an option because they do not store historical tick data beyond six months back). Also, right now all the analysis is very simple and just done in native Python. It will be good enable some more advanced metrics by incorporating R functionality.

For me personally, I’d really like to learn more Javascript/AJAX. It seems hugely powerful and my exposure has been relatively limited so far. A couple days ago, I came across this video from this year’s Google I/O conference which was an excellent introduction to Javascript. Eventually, I’d like to make the interface a lot sleeker/more user friendly. This also probably means I’ll need to get more comfortable with CSS.

2 thoughts on “Learning to build a web app from scratch: Part II”

Pingback: Storing and analyzing equity tick data (trades and quotes) using HDF5 / h5py « dcaisen
Dmitry Kursov says:

September 30, 2014 at 7:59 am

I know you have chosen the framework already but have you considered Oracle ADF framework? Jdeveloper IDE very slick and powerful as well, it generates a lot of stuff for you behind the scenes. There are few nice tutorials on youtube that demonstrate the power of Jdeveloper for rapid webapp development. My personal favorite feature is taskflows that allows you to describe page navigation declaratively without any coding.
If you did consider it in the past what made you to choose Django?

dcaisen

Daniel Aisen's site