April 14, 2014
This heartbleed thing shook us all a little bit. When I first learned about how everything works from transistors all the way up to HTTP I was awed at the power of our abstractions. This time I am in awe of their fragility. I mean fragile in the Taleb sense i.e. neither able of improving because of volatility and stress nor able to withstand them.
I am no security expert or anything of the sort, but there is one glaring fragility in all of this: the organizational structure of the entities that produce infrastructure. If SSL is so important why does 66% of the web rely on a single implementation of it? Why aren't there 20 independent open source implementations? Heck, why aren't there 3?
The case has been made elsewhere that we, as a civilization, don't allocate enough resources to security software. This is true, but there is another side of the issue: If you are running something as important as the main implementation of the security protocol of the web, it behooves you to gather resources to do a good job of it. Not doing so is plain irresponsible.
The angle of poor OpenSS developers who give us this much software for little reward does not work. It is true – and thank them! But it is also a huge failing on their part not to have something akin to the Apache Foundation to collect donations and organize efforts (e.g. such as running bounty programs and CTF challenges). For a project like OpenSSL, fundraising is not just a minor thing the core devs do, it is as important as writing high-quality code.
The organizational model of the Apache Software Foundation is different than, for example, the average Django app or Rails plugin. And rightfully so.
Things like OpenSSL should be closer organizationally to the way that the Linux Kernel Organization is run than to the way that django-compressor is run, but it appears that this isn't the case.
The 'scratch my itch' OSS model does not work for security because it is nobody's itch to go audit an implementation of SSL. For large organizations, auditing implementations is risk minimization at best. For small organizations it is not even a possibility as small organizations can only afford to scratch their most important itches.
While it is tempting to propose some large open source organization a la Apache Foundation, responsible for multiple independent implementations of important protocols, this would leave exposed whatever risks are inherent in the way the organization is run.
The better way, in my humble and grossly uninformed opinion, is to have multiple independent organizations responsible for implementations and try to share as little as possible among them. Minimizing information sharing between open source organizations seems tricky, but having conscientious project leaders whose mission is explicitly to run these so that they provide redundancy and diversity might be good enough.
One problem with this, though, is that security industry brands itself as utterly impenetrable. Don't bother if you are not an expert. This is very detrimental. Crypto is hard, yes, but that doesn't obviate the need to diversify and encourage people to participate.
Compare this attitude to, say, how we got the linux kernel: talented hacker just did it and got into an argument with the experts. Culturally, at the time, there was no sign at the door telling people to go away because they are not experts.
Monoculture is just a special case of monolithic and a couple of levels up the abstraction ladder from monolithic organization. Monolithic organization is bad, monoculture is worse.
One way or another, achieving redundancy is important.
Blah, blah, blah, talk is cheap. How to actually do this? Here's the high level overview about how I'd go about it. For simplicity I am just referring to and thinking about SSL below, but the arguments should apply generally to any security software project.
Maybe this is because it is all I know, but starting this type of organization as a security non-expert seems very similar to starting a startup. The overarching ideas are that everything needs validation & iteration and that the project will only work if the people who start it devote about equal parts to technology and business. The latter, in this case is not about validating a market, but rather, about validating the project for fundraising.
One aspect in which a project like this is different from a startup is its tolerance to technical debt. A security project has virtually no tolerance, whereas startups can afford a lot of it.
Start it like a side toy-project, with loud disclaimers that it is not ready to be used.
Pick tools that will make it easy to learn the protocol and code quickly. That suggests to write in a modern scripting language with a rich ecosystem like python; however, since at the onset it's known that making guarantees about the correctness of the code is important it also makes sense to do it in a language like Haskell or Coq. It may even make sense to write a small prototype in a few languages in order to fully consider some of the issues.
In the case of security software, working in a language without a large ecosystem of libraries is probably an asset in the long run since you'll have to really understand whatever libraries you use and often writing something for your specific use-case is easier to do than understanding the general implementation.
Start writing code trying to implement the smallest subset possible that makes sense and get something small working quickly. Then start worrying about code and project organization as to maximize readability and auditability. Those will be difficult to achieve if they aren't baked into the code from the beginning and, in general, readability and auditability should be as big a concern as having a working implementation.
Some time during the bootstrap phase I would convince the most talented hacker(s) that I know personally to work on this as a side project.
Proceed implementing incrementally and worry more about the quality of the code and surrounding tooling (e.g. documentation, tests) than the completeness.
Once the project reaches a certain level of maturity make fundraising the top priority. The level of maturity needed is whatever allows these to happen: someone wants to work full-time on the project and the project can convince third parties to donate money.
It's tempting to set that goal in some concrete way e.g. “we'll fundraise when we have a full implementation”, but that is a mistake. The exact point at which some organization or person is ready to donate money depends greatly on how the project is being executed. I can also envision kickstarter campaigns succeeding before any milestone is reached.
The overall business goal here is to incorporate as
501(c)(3) and worry about having enough money to pay one maintainer to work full-time on this.
Shift all focus into completing an implementation of the protocol and producing a toolkit that makes it possible to use this in real, production settings. At this stage only worry about this making it a possibility and not about making it advisable i.e. keep the loud disclaimers about unreadiness.
This is analogous to the 'trough of sorrows' stage of a startup. The milestone of having a complete, usable implementation is very difficult to achieve and resources will be limited. Reaching this milestone may require additional funding, but shifting the project to “fundraising mode” at this stage should be avoided as much as possible.
0.9.0 is the first complete working implementation, the one that contains the fruit of our best efforts.
The next main goal is to secure funding to be able to run risk-minimization efforts. Namely, a bug bounty program, a continuous pwn-the-server contest, and to pay for an audit of the source code done by an independent third-party of high repute (e.g. Matasano).
The 0.9.x releases are the iterations during which problems exposed by audits and bounty programs get fixed.
Like this, it is possible to never get to 1.0, the first production-ready release. To fix this, declare that 1.0 will be reached after a given 0.9.x release runs without bugs or problems exposed for a certain amount of time. The amount of time in question should be decided by the core developers.
During this phase, organizationally, it is also important to clean up the financial reporting of the organization. The finances should be such that it is transparent how money is allocated across paying people to code, paying people to run the organization, and paying for risk-minimization efforts.
The project declares that it is production ready and shifts focus to driving adoption and increasing the level of risk-minimization. By the 1.0 release the organization should be a mature non-profit organization which employs a small team of core developers full-time and possibly a small business team. A headcount higher than ~10 is likely a bug in the organizational processes and should be fixed accordingly.
In the long term, it is important that the organization remain small and if anything devote additional resources to helping create other competing organizations rather than growing. It should neither grow in scope nor in headcount.
Heartbleed showed that we are very far off from having truly robust security infrastructure. Organizational factors are just as important technological ones. We can dramatically improve the status quo by innovating and iterating on the organizational structure of security-related open-source problems.