We have free software. We need free databases.

Transcript (more or less) of keynote at RMLL: We have free software. We need free databases. Discussing how to create a softaware ecosystem by decoupling information from applications with a universal database.

My name is Heather Marsh. I am a writer and a programmer and I have been studying and experimenting with methods of mass collaboration and the technology we use for collaboration for many years now. From 2010 to 2012 I was the administrator and editor in chief of the Wikileaks news site Wikileaks Central where I experimented with tying leaks to current news as a catalyst to create informed action. In 2012, I concentrated more on social media collaboration and I wrote a book called Binding Chaos about the methods of mass collaboration that were used to create mass movements. I am currently writing a book called Autonomy, Diversity, Society about some of the social issues and institutional structures preventing effective mass collaboration, particularly those involving knowledge industries like journalism, science and academia. And I am also developing a universal database and trust network called Getgee which will hopefully help with some of the issues I have been having for years around online collaboration.

So today I would like to first talk about how we collaborate online and then I would like to look at the technology we develop and how that affects collaboration. And then I’d like to review what a universal database is and why we need such a thing. These are all very big topics and I want to leave room for discussion at the end so I am going to touch very lightly on some areas. If you would like to know more about them, you can find me on youtube or my blog where these topics are explained in more detail, or read the books which are free online or send me an email.


What was it about the Internet that was so important and world changing?

Very often we think of the answer in terms of communication, but we don’t really communicate with all the people on the Internet on a one to one basis and reach consensus. This is a picture of what mass collaboration usually looks like, both online and offline. This method is called stigmergy as it says on the slide and I have written and talked a lot about stigmergy so you can find more information in my writing if you like, but basically, it is a method of action based collaboration that follows an idea.

If you look at this, there is no formal structure. There is just an idea or a goal that is reflected to everyone. This is the type of mass movement that can happen more or less spontaneously, sometimes very suddenly and very very effectively. Once this idea is released publicly, if people believe in it, they will follow it across cultures and generations and language barriers and it will be truly unstoppable. We have followed stigmergic movements throughout history for mass migrations, for adopting new technologies like pottery or Facebook, or for instilling moral principles or beliefs.

This isn’t the only method that can cause mass migrations and huge collaborative projects. For the last several thousand years, we have headed very relentlessly towards a very hierarchical and controlled structure for collaboration that happens very formally through appointed official channels and uses a system of sticks and carrots with coercion from military and money as rewards and punishments to drive us through these channels and direct our behaviour.

But the Internet allowed a sudden proliferation of stigmergic movements to spring up instantly and globally and scared a lot of powerful people very much and even removed some of them.

Imagine if these swallows were people – you can see why those trying to maintain authority would be very concerned, it doesn’t look like there is any sort of structure or predictability here. Communication on the Internet allowed us to return to this form of swarm movement which is really powerful, but at the same time very scary. In many ways, this looks like a new age of collaboration. We went from small local collaboration to highly structured so-called civilization and this is a whole new thing, even though stigmergy is a method of collaboration that has been with us forever, it has never been at this scale or speed. So it is important we learn how to help these movements guide themselves, and when we design tools that assume we are going to use direct communication or votes or consensus, they are not going to work for us, not for mass movements.

The single biggest factor I’ve found for whether or not someone will participate in a stigmergic action is whether they are sure of the idea leading it. Not whether it affects them, or if its simple to grasp or easy or even safe to do. I have created many actions where the audience was completely removed from the people affected or where the action was dangerous or very difficult to understand or even initially believe. None of this mattered. All that mattered in whether the action was a success was whether people could be sure the goal is sound. And the easiest way for someone to prevent action is to sow doubt in the goal. So this initial seed that makes up the idea is the key to every mass movement.

If you look at this as a form of governance, we aren’t going to be governed by people, we are going to be governed by ideas. So our job in information technology is world governance. How we present and filter ideas directs these mass movements far more than politicians do. People will run in the mazes we give them to run in, so when we are designing tools for collaboration and communication it is very important that we get this right and think about the implications of what we are building. We have to allow people the freedom to rise above these mazes of official channels but still protect themselves from being coerced by propaganda and ignorance.


So with our new scary power of mass collaboration we have also seen a change in how powerful people are attempting to coerce these mass movements. The old way, or the most recent way, of directing movements was with hard coercion. You will follow this idea or you will be burned at the stake. And we can put money in there as hard coercion as well because you will work in this mine or you will starve is pretty violent as well. In this structure ideas were very carefully controlled. After the printing press was invented, only those ideas approved by someone with a printing press got mass dissemination and it was much easier to control what was distributed.

Once we had mass communication, on the Internet, that all changed fairly overnight and took a lot of existing power by surprise. So we instantly saw powerful people attempting to use all the usual hard coercive methods to control these ideas that are starting mass movements, like murdering bloggers and censoring the Internet, but it was pretty apparent that this was not going to be a sustainable long term global solution to keep people in power. So for the last several years we have had a huge focus put on what I call seductive coercion, manipulation of how we think and how we react to these ideas. Seductive coercion uses fear, belonging, shunning, all of our most deeply felt emotions to drive us towards or away from ideas.

The first and easiest way to counter ideas is to conflate the idea with a person or ideology. If someone puts up an idea and other people say oh, that’s funded by so and so, or that’s neoliberalism or Marxism, many people will turn away from the idea before they even try to understand it. Alan Turing once described the campaign against him as: Turing believes machines think. Turing lies with men. Therefore machines cannot think. Anyone who has been on social media long enough can recognize his frustration.

Another favourite counter is noise, you can’t follow an idea you never hear so if you have an idea that powerful people don’t like, they can just drown it out. And another counter is confusion. If an idea is at a level of expertise too specialized for the average person to prove whether it’s true or false, someone can just say it doesn’t work or it has been rebutted. If most of the public can’t prove it one way or another, they can just call it fake news. We are overwhelmed right now with seductive coercion as well as noise and confusion on our current social media platforms and the purpose is to try to control these ideas that are all important in seeding stigmergic mass movements.

I don’t think seductive coercion and noise is a long term solution for guiding people away from or towards ideas. I think most of us are thoroughly fed up with fake news and bot farms yelling at us every time we go near the Internet. We need information we can trust for stigmergic organization to work and we need stigmergic organization because that’s the only way we are going to progress on a large scale in the future. Misleading information will encourage people to act against their own interests but a lack of trust in any information will immobilize them or encourage them to blindly follow demagogues or ideology.

We can’t afford to waste time like this and I think even many powerful people are starting to realize that maybe a very easily led public is not such a great thing after all. Theresa May keeps saying “Make no mistake, the fight is moving from the battlefield to the internet.” but she and others are recommending fighting with the old methods of hard coercion, they are calling for even tighter control over information and more official channels. This is an attempt to go backwards and we have burnt that bridge, we have no path open now but the one forward.

So the third method here is auto-coercion, coercion of each other as an informed society, which is a swarm method of reaching consensus. This is where I would like us to head and this is what, in my opinion, we need to be building technology for. We may or may not have reached a technological singularity yet but we have certainly reached a societal singularity. We need to collaborate with others even just to understand the news. We can’t all be experts at everything. So we can’t keep berating voters for not spending all of their time studying everything that affects them or electing politicians and expecting them to have all the answers. It’s impossible. We need to find a better solution. We need technology that allows us to put our faith wisely in information and this is going to require a completely different set of rules than the hierarchical official channels we have used in the past.


Ideas need to be audited and promoted by people qualified to understand them so I use this structure of concentric circles with epistemic communities in the centre and knowledge bridges to assist information flow and auditing. With knowledge bridges, you don’t have to have personal expertise on every aspect of society. All you have to do is have a transparent concentric circle that you can look at, you can see the activity, you can get feedback if necessary, and you can say yes, there are a lot of people auditing, there is a lot of discussion, I trust some of the people in these circles, I trust that they know what they are doing. Unless you don’t, in which case experts can also be created by the system itself as users develop knowledge and reputation and move towards the centre. Disagreements can result in a separate concentric circles being formed around the same problem to explore a different solution. If this all sounds familiar, it’s because this is exactly what happens in open source software communities.

Currently, other knowledge communities act like closed, internationally linked, affinity groups at a level of expertise not accessible to the general public. Science, academia and journalism are very far from acting as concentric circles and knowledge bridges integrating ideas back and forth with wider society so stigmergic action rarely results from their work and without stigmergy, their progress is not nearly what it could be. A people with no confidence in either their epistemic communities or their knowledge bridges is a people with no belief in ideas, and with no belief in ideas we will be immobilized or ruled by demagogues or ideology. Science, academia and journalism require two way knowledge bridges, transparency and free information if they are going to truly act as epistemic communities for us all and stimulate and inform mass action and they also require information focused technology to support them in that. So let’s look at where we went wrong.

data new oil

We no longer live in a world dominated by either resource capitalism or industry. We live in a world dominated by information capitalism and information control.

“Data is the new oil” is a quote that has been going around investment circles for over 10 years now but data is more than just a product. With oil you could acquire money to drive people along the paths of hard coercive structures. With data you can lead far more people with seductive coercion or block them with noise and confusion. But even for people who haven’t quite grasped that information in this form can be used to wag the tail of an entire mass of humanity behind it, most people realize data is lucrative.

The last I checked, Alibaba was worth 265 billion US dollars. Amazon is worth over 400 billion. Even Uber is still worth 60 billion. Who has any idea what Facebook and Google are worth because they actually have mastered the true value of information. What makes all of these corporations so powerful and valuable is their control of information. When the world wide web was designed, it was a picture of academia. It was meant to allow isolated papers to cite other isolated papers, but the internet does not look like that. Those early pages have been used to create an Internet as a series of sealed wells. Even if we have access to everything on the surface, we do not have access to the information in these wells and to add insult to injury, we created all the information in those wells. And that data is used for public manipulation and seductive coercion, in ways ranging from monitoring our shopping habits to Facebook deciding to what mood we are in or whether we vote. That data under corporate control decides whether we are talking about political failures or Romphims and whether the resulting stigmergic actions are related to community support or consumerism.

No one should be gifting their innermost thoughts to allow coercion of public opinion by undemocratic entities who have only maximum profit for their shareholders as a guiding principle. No one should risk storing their personal data on a platform that sees their data as ‘the new oil’. Of course, this is not news to anyone here, but even though we all realize this, we haven’t been able to stop it.

The failure to replace the existing data mining platforms is partly the failure to differentiate between different types of data and their different requirements, so let’s go over that first.

information types

Personal data: The goal here is security against dissemination. Ideally, we want to keep this off the Internet and if that is not possible, encrypt it and keep it under your control and easily deleted.

Personal messaging: The goal here is to know who you are talking to and keep your conversation private, so we treat this like personal data and add the fact that we need to be sure of who we are talking to.

Personal information and private messaging have both had had a lot of investment and thought put into them over the years. If we can get this data split out from the corporate data wells, we have a lot of options for this, like personal online data storage, and many similar options. These work but …

personal data

The idea here is you keep all your personal information in your own little PODS and it is always under your control and you choose where to share it and what to share, which is as it should be. But in a way this reminds me of when people were told they no longer had to work as slaves or serfs because they were free and could choose when to work for money. But then they found out that all the food and shelter was controlled by the money sources so it became not much of a choice, work or starve. And if all we do is take control of our own personal information, if we want to access other information and it is controlled by corporate interests, we will end up giving over our keys. And there will be no legal protection because we will be doing it voluntarily. Supposedly. We see this already in data access, we have the ability to block ads or cookies but then we get our access to information blocked so we don’t. So it is not enough to address personal data, we have to address public data as well.

The answer for public data is the same as personal data, we need to decouple application software from the data to regain control and we need some sort of a data commons. There is a huge wealth of data on the internet right now that is not personal data but it shouldn’t be corporate data either, it is our information that we have produced collaboratively over many years. If we are ever going to move towards an open information system with concentric circles and knowledge bridges and collaboration open to everyone, we are going to have to achieve this, off a corporate web platform, so let’s look at our data types again.

Personality focused we don’t really need to look at, we’ve got that covered. The goal in a personality focused platform is promotion of personalities (or brands), so official blue checks and followers assembled around social media reality shows. Social media is almost universally personality focused and that is why social media is so frustrating for anyone attempting to collaborate around information.

With Information focused data, the goal is research, auditing and dissemination of information. At best we can use Wikipedia, mainstream media sometimes and specialized research platforms. There is a huge need for information centred solutions, especially when you consider how many people are trying to facilitate this work on personality focused social media and how frustrating this is. Social media was created in a time when we left all information focused work to the experts, but especially twitter immediately became a place where everyone got involved and tried to create concentric circles of auditing, feedback and amplification, but because it is focused around personalities instead of information it is forever frustrating for this use.

Public data. The goal here is freedom from censorship or other deletion or modification. So the dead opposite of what should be the goals for personal data and if someone is offering you the solution to both on one platform, run as fast as you can. Most applications with a primary goal of making data public use peer to peer with or without blockchain, or censorship resilient platforms of some sort. It is very easy to put up public data in a very resilient and even semi-immutable way on a peer to peer system.

The problem is, if you combine this need with Broadcasting we have an added goal of wide dissemination. You can put information up, but getting a lot of people to find it is difficult unless traditionally you have some sort of index that will search through this data and offer some sort of centralized list. The last time someone did that on a large scale they called it Google. As soon as you have a centralized index like that, your data can be as decentralized as you like, all that means is you get to pay for hosting. If you still can’t be found except by going through the indexing server then we still have a problem of centralization. There are other methods of finding each other, but with other issues, which we’ll get to under collaboration.

Broadcasting is also where we get too much misinformation, spam, noise, confusion, seductive coercion, all those things we talked about earlier. The way many information producers and politicians, and sadly some regulatory bodies, want to tackle this is to attach DRM at the data level with the goal of preventing copying or facilitating built in micro-payments. This is again an attempt to take us back to that highly regulated structure that we already burned the bridge from and the implications in this case are too seriously dystopian for me to even start to get into here.

If we want a world where the general public is both participating in and trusts expert information – and we do, the lack of an informed public is the biggest issue facing us today – we need to get away from this idea that anyone is going to own information or the access to it. Information has to be a basic human right for all, it is our key to understanding our world. There is no point in being able to vote if we don’t understand what we are voting for.

Another point we have to deal with for public data is the typical peer to peer application is Read only for fairly obvious reasons. If you want to design a Collaborative platform that can scale, your concerns are going to be things like latency compensation, optimistic UI, speed, and the ability for multiple simultaneous editors. All these goals of a seamless collaborative front end performance do not correlate with a back end trying to serve peer to peer data. So broadcasting collaborative public data is a tough one.

Then we have the whole structural issue. A lot of people feel the solution to the data mining wells we saw earlier is federation or a form of decentralization that allows everyone to run their own instance of an application. First of all, we can all stand up our own wikis but Wikipedia exists, centralization happens. Second, the goal here is to escape dependency on one server or platform, but most of these solutions just create multiple little centralized servers or platforms, each with their own tyrannical or benevolent admins. They also make your data impossible to delete if they are linked to other instances and by eliminating central control they also eliminate central responsibility that you can complain to.

These alternatives are usually addressing technical issues and ignoring the personal ones. A microblogging instance, a sub-reddit and an irc channel are all technically very different but they all feel very alike and they are all trying to be all things to all people. Their users do not have control of the data, but it isn’t truly public either. They aren’t the right choice for private messaging, but neither are they the best choice for public broadcasting. They are personality focused and hopeless for information gathering but they are not a good celebrity vehicle either. They are decentralized by server (theoretically) but the data is not decoupled from software. They don’t support mass collaboration and they create thought bubbles where outside opinions aren’t welcomed.

So Decoupled data is where I think we need to go before anything to handle all the very different and frequently opposing requirements of these different types of information. The goal here is freedom from corporate ownership of data, freedom from software dependency, data reusability and versatility of use. Data is separated from application software and is agnostic to what applications are used to access it. For this we need a universal database which is why I am working on that.

Our greatest need is for a collaborative information commons, for open journalism, for open science, and just for fun. We need a place where the data is not personal data but it is not corporate data either. We need a place where the application software is decoupled from the data but the data is all still linked. And I really believe that this isn’t a “here is a great marketing opportunity” need it is a “will we avoid human extinction” need, we are completely programmed by the information we receive and if we want to avoid errors on a massive scale we need to provide our new mass movements with accurate information that they can trust.

Now we have covered what we need and why we need it, all we have to is design it with the technology available to us today.

core data

We need a set of core data objects to be in a public data commons. These are the objects that link all of our information together. In my research with existing applications, there are five types of data object that we deal with which are Person, Event, Organization, Idea and Thing. In addition to this there are media objects which we use as sources for our information. So our public, universally accessible, read only, data consists of these objects.

We also have classification standards of different types of these data objects and the possible relationships between them. Again, anyone can create classification trees but once they are used by a third party they can’t be deleted or modified. Once you share an idea you can’t delete it from anyone else’s brain, it is now commons data.


Universal data objects are pretty meaningless if we don’t actually say anything about them. So constellations are a collaborative space where we establish relationships between data objects, link media sources to support the relationships and classify data objects and their relationships. This is our own autonomous space but if you click on a universal data object it will show you all the constellations that have used that data object. If you clicked on an event node for this conference, it could show a constellation for all free software conferences this year, all software conferences in France, all events in St Etienne in July, any constellation that included this event node would be listed. So even though this is a collaborative space which allows you to decide who you want to work with, the results are transparent and easily found by anyone looking at any topics that you’ve referenced.

Which is nice except we really don’t want to see all the data on the Internet. Neither do we want to leave it up to a centralized index what we see, so we need a trust network to help us filter spam and people we don’t find very knowledgeable and that trust network needs to be under our control. A trust network means if we set our search to 0 degrees of trust we will only see data that we personally have explicitly trusted. If we set it to one degree we will see data trusted by those we trust and so on. This is great for those people who say they want open science and open knowledge but really they are uncomfortable without official accreditation. They can filter their own information by whatever accreditation they choose, but they can’t stop the other information from existing and they can’t stop anyone else from seeing it.

This also brings mass collaboration back because we don’t have to personally know everyone we are working with if we are careful with our trust network. If someone sabotages a work project, obviously they are going to be untrusted very quickly by whoever brought them in or that person will be untrusted themselves, but there is no need to reach consensus over who is or isn’t trustworthy and assign them a blue check, we all decide for ourselves whether someone else’s network is worth linking to our own.

So now we have all of this information in a data commons which can be linked together and sourced, so we have transparency and multiple viewpoints. No one can create a thought bubble and filter out information but they can choose not to see it themselves or work with it if their network deems it untrustworthy.

G ecosystem

So now we have broken everything up and we no longer have a web page or an app, we have an ecosystem. At the bottom we have a universal data commons of public data which contains all of our core data objects. Also universal are the classification trees which data analysts can create to help us provide more meaning and classification standards. Then we have our collaborative space where researchers, journalists, scientists, lawmakers, organizers, or anyone else can create meaningful information from all this data without harassment from spam but with full transparency to the public. And we have our personal trust networks where if we choose, we can set filters on what we see.

Now you can search, merge, and filter that data to get only the information you need. Suppose you want a taxi driver. You can merge taxi driver collective constellations, you can search for ones in your area and then you can filter them within two degrees of your trust network. Then you can download that read only galaxy onto your phone, download an app that shows you which of them are near and pays them. And we just took the last sixty billion dollars worth of value from Uber and put local control back with taxi driver collectives. Software application become simply that, easily replaceable applications which provide some functionality such as paying a taxi driver or buying a product. They have no control over data any more and they can be easily replaced.


Instead of transient news, new information is added to a permanent knowledge repository so it encourages deeper research over trivial updates. There is no need to cut and paste the same news repeatedly if it is all linked. We can have more fluid collaboration between journalists because they retain autonomy and credit for their own work but their research is automatically linked with everyone working on the same topic and it can be combined in galaxies. Because the data is in a usable format instead of just wall of text articles, we can import it to other applications and combine and filter it to create more information. So we get far deeper meaning and more context and usability and collaboration from the same research effort.

Organizations can use constellations and galaxies for dynamic reorganization. The responsible person can change at the constellation level and the change will be instantly reflected in all associated galaxies. It is also easy to plug in collaborative apps at the galaxy or the constellation levels to allow groups to work without outside noise but remain completely transparent to the public.

Rather than relying on site reviews and trust algorithms we can rely on our own personal trust networks for recommendations. Local or specialized merchants can create constellations to link each other together in a trust network as well, adding another local layer of accountability and control over industry and the ability to allow regional diversity for local laws or customs.

Instead of relying on NGOs, charities and non-profits, we can use our own trust networks to provide aid directly where it is needed and receive feedback directly from those receiving aid.

We can establish direct trade relationships between communities which will allow consumers to see the immediate impact of our trade choices.

Instead of a closed circle of academia in which paper citations can be reflections of power or reciprocity, Idea nodes can be set up around any topic and all contributions heard.

Principles of a society from constitutions and bills of rights can be easily accessible for every member of society and we can then ensure that all law in that society flows naturally from the accepted root principles. It is also possible to use accepted principles to choose association, for instance to refuse trade with a corporation that refuses to accept certain environmental or human rights practices.

With a universal data commons we can collaborate effectively and intelligently and solve the problems we are facing with far greater speed and accuracy. We can all be much better informed and able to easily see the original sources and all related information on a subject from all perspectives. This will provide us with the ability to be self correcting, even in our mass movements. We can have auto-coercion.

The Internet is being redesigned. This is a moment when we urgently need to design our own Charter of the Forest and establish our rights around our commons information. We had this moment before when the Internet first appeared and it seemed destined to become a corporate controlled platform and we made possibly the best decision in our history by having the Internet be a global commons, which is now recognized as a universal human right, but the only reason access to the Internet is important is because it provides access to information. The only reason freedom of speech is important it it allows us to transmit information. We need to recognize information itself as a universal human right and write code that facilitates that. I think I’d better stop here so there is hopefully some time for questions but if anyone is interested in this project, or you know someone who may be interested with this project, please contact me.

Questions and answers are not transcribed.

Talk at RMLL 2017


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s