Imaging, Snapchat and mobile

For the first time, pretty much everyone on earth is going to have a camera. Over 5bn people will have a mobile phone, almost all will be smartphones and almost all will have cameras. Far more people will be taking far more photos than ever before – even today maybe 50-100 times more photos are taken each year than were taken on film. 

Talking about ‘cameras’ taking ‘photos’, though, is a pretty narrow way to think about this – rather like calling those internet-connected pocket supercomputers ‘phones’. Yes, the sensor can capture something that looks like the prints you got with a 35mm camera, or that looks like the footage a video camera could take. And, yes, it’s easier to show those images to your friends on the internet than by post, and easier to edit or crop them, or adjust the colours, so it’s a better camera. But else? Terms like camera or photo, like phone, are inherently limiting – they specify one particular use for underlying technology that can do many things. Using a smartphone camera just to take and send photos is a little like using Word for memos that you used to create on a typewriter – you’re using a new tool to fit into old forms. Pretty soon you work out that new forms are possible. 

So, you break up your assumptions about the models that you have to follow. You don’t have to save the photos – they can disappear. You’re not paying to process a roll of 28 exposures anymore. You can capture all the time, not just the moment you press the ‘shutter’ button (which, for example, gives us Apple’s live photos). The video doesn’t have to be linear – you don’t have to record just the right bits as though you were splicing a mix tape or recording from live radio. You can put text, or images, on top of that video, and it’s not part of the ‘captioning’ section of a video editing program – it’s a basic part of how you use it. Images are just software – they can be anything. Just as the telephony app is just one app on your smartphone, the camera app is just one app for your image sensor, and not necessarily the most important. There are other ways to talk to people beyond calling to texting, and there are other ways to use imaging.  

This change in assumptions applies to the sensor itself as much as to the image: rather than thinking of a ‘digital camera, I’d suggest that one should think about the image sensor as an input method, just like the multi-touch screen. That points not just to new types of content but new interaction models. You started with a touch screen and you can use that for an on-screen keyboard and for interaction models that replicate a mouse model, tapping instead of clicking. But next, you can make the keyboard smarter, or have GIFs instead of letters, and you can swipe and pinch. You go beyond virtualising the input models of an older set of hardware on the new sensor, and move to new input models. The same is true of the image sensor. We started with a camera that takes photos, and built, say, filters or a simple social network onto that, and that can be powerful. We can even take video too. But what if you use the screen itself as the camera – not a viewfinder, but the camera itself? The input can be anything that the sensors can capture, and can be processed in any way that you can write the software.

In this light, simple toys like Snapchat’s lenses or stories are not so much fun little product features to copy as basic experiments with using the sensor and screen as a single unified input mechanism. It’s a commonplace to say that a smartphone is a piece of glass that becomes whatever app you’re running, but there’s no better way to see that than a Snapchat lens, or perhaps Pokemon Go – the device becomes the sensor and the sensor becomes the app. A fundamental change in going from a mouse to a touch UI is the removal of abstraction – you don’t move hand there and see a pointer move here and click the button on that. You just touch the thing you want, directly. It’s not indirect and mediated through hardware and UI abstractions any more. With these kinds of apps you look through the phone – it becomes transparent (or tries to), and there’s another step reduction in abstraction.  

As one builds on this, at a certain point you find that you’re no longer making things that you could have made for a desktop PC, and rather, you’re making things that could only work or only make sense on the new platform. But within that, there are things that work better on mobile and things that only work on mobile. There’s really no reason you couldn’t order an Instacart or post an photo to Instagram on a PC – mobile removes friction but isn’t essential to the whole concept. You could port them back to the desktop, but you made them mobile-first or mobile-only. And of course the Facebook newsfeed is entirely a desktop product, ported to mobile with a better revenue model. But we also have products that are only mobile. They’re not mobile-first so much as mobile-native. 

The obvious evolution for this is augmented reality, in the sense not of a Pokemon on a phone screen but a lens you wear: something like Hololens, Magic Leap (an a16z investment), or others yet to come. When these are mature, a virtual object will look pretty much as though it’s really there, certainly allowing for a little suspension of disbelief. At that point you’ve pulled this conversation inside-out: instead of putting a mask on your friend as you look at them through the phone, you’ll put it on them in real life (and might not even tell them). After a decade or so in which mobile phones swallowed physical objects (radios, clocks, music players and of course cameras), AR means you start putting objects back into the real world. Where a smartphone becomes each app that you use in turn, but AR can put each of those apps onto the table in front of you. 

Meanwhile, while we can change what a camera or photo mean, the current explosion in computer vision means that we are also changing how the computer thinks about them. Facebook or your phone can now find pictures of your friend or your your dog, on the beach, but that’s probably only the most obvious application – more and more, a computer can know what’s in a image, and what it might represent. That will transform Instagram, Pinterest or of course Tinder. But it will also have all kinds of applications that don’t seem obvious now, rather as location has also enabled lots of unexpected use cases. Really, this is another incarnation of the image sensor as input rather than camera – you don’t type or say ‘chair’ or take a photo of the chair – you show the computer the chair. So, again, you remove layers of abstraction, and you change what you have to tell the computer – just as you don’t have to tell if where you are. Eric Raymond proposed that a computer should ‘never ask the user for any information that it can autodetect, copy, or deduce’; computer vision changes what the computer has to ask. So it’s not, really, a camera, taking photos – it’s more like an eye, that can see.