Codec Avatars and Open Sourcing

Starting on the Codec Avatar Team#

In 2018 a recruiter from the recently rebranded Facebook Reality Labs (formerly Oculus Research) approached me about an opportunity at the Pittsburgh office. I had no clue that Facebook even had an office in Pittsburgh and no idea what the work entailed.

I interviewed at the office and on the tour was able to see Mugsy and Sociopticon, two of the large scale systems the lab had built. Seeing the scope and ambitious goals of the work made me even more interested after interviewing in joining. I was lucky enough to pass the interview and started at the end of 2018.

But, there was one thing that stood out to me more than the capture systems, which happened during my interview with Yaser Sheikh. At the end, I had the opportunity to ask questions and posed to Yaser, “why Facebook?”. His response was the openness of Facebook. Yaser and the whole lab would have the ability to continue publishing papers and sharing their work. This open collaboration is very common in academia, and he was happy to see it could continue. This stood out to me compared to every other place that I had worked. It is because of this openness I can even write about this work, as everything I’m mentioning here is coming directly from articles we published or gave specific access to others.

The Overall Openness of Meta#

The fact that Facebook (now Meta) would allow Yaser to continue to publish I found surprising at first, but looking more I found that Meta was open in a lot of different ways.

First, I learned that Meta has open sourced many projects, including PyTorch and React. The history of continued open source support can be argued, because Meta’s open source support for things like flow has competed with internal priorities. Pyrefly appears to be focusing on open source, so I’ll be interested to see how that pans out. While not perfect, the fact that several of these technologies have now become rather large open source projects shows it does work to an extent. More recently, you see that Meta is releasing Llama models under open source-ish licenses to help further AI development.

As I continued working in the lab I started to see how many papers were being published by our lab. I think some years we had a dozen or more papers across big conferences like SIGGRAPH and CVPR. For some reason the publications list hasn’t been updated in almost two years, but it does show a scope of the papers being published.

I found that there was a concerted effort within Reality Labs Research to let outside journalists in to see the amazing work being done. Personally, I find Norman’s video from Adam Savage’s Tested with the Display Systems Research team to be an amazing deep dive into the limits of VR headsets and well worth the hour.

I experienced this first hand a few months after I started when we opened up the lab to both CBS and Wired. This resulted in two major press events. First, Wired published a detailed article about our lab. Additionally, CBS ran a story during Sunday Morning, where I even made a brief background appearance.

CBS Sunday Morning Codec Avatar Segment

Our labs largest effort to work with a journalist would be the Lex Fridman and Mark Zuckerberg interview, which was done using virtual with Codec Avatars.

Lastly, we got to show off our work on Codec Avatars during the Connect Keynote 2024 on the Orion AR headset.

What Did I Do on the Codec Avatar Team?#

During my time on the team, my main thing was working on the pipelines handling the vast troves of data. Per the CBS story, that Mugsy system from 2019 was outputting about 180GB/sec of raw data. The amount of data that could be collected by these systems is staggering,as will be shown with the size of the open datasets. It turns out that there is a lot of engineering work required to handle data at this scale.

Codec Avatar Datasets and Papers#

The rest of this will be focusing on the datasets, since that is what I worked on and was a co-author on a paper for. There are many other open source publications from the Codec Avatar webpage, including the datasets I’ll be talking about.

In 2024, we released two datasets, Ava-256 and Goliath-4. Our paper, published at NeurIPS 2024, details how they were generated and the data inside of them. But I’ll try to summarize it some here and give a little commentary as well. If you would like to cite the above paper, the information is below, and there are a lot of authors because the effort to build these complex systems is massive!

@article{martinez2024codec,
  author = {Julieta Martinez and Emily Kim and Javier Romero and Timur Bagautdinov and Shunsuke Saito and Shoou-I Yu and Stuart Anderson and Michael Zollhöfer and Te-Li Wang and Shaojie Bai and Chenghui Li and Shih-En Wei and Rohan Joshi and Wyatt Borsos and Tomas Simon and Jason Saragih and Paul Theodosis and Alexander Greene and Anjani Josyula and Silvio Mano Maeta and Andrew I. Jewett and Simon Venshtain and Christopher Heilman and Yueh-Tung Chen and Sidi Fu and Mohamed Ezzeldin A. Elshaer and Tingfang Du and Longhua Wu and Shen-Chi Chen and Kai Kang and Michael Wu and Youssef Emad and Steven Longay and Ashley Brewer and Hitesh Shah and James Booth and Taylor Koska and Kayla Haidle and Matt Andromalos and Joanna Hsu and Thomas Dauer and Peter Selednik and Tim Godisart and Scott Ardisson and Matthew Cipperly and Ben Humberston and Lon Farr and Bob Hansen and Peihong Guo and Dave Braun and Steven Krenn and He Wen and Lucas Evans and Natalia Fadeeva and Matthew Stewart and Gabriel Schwartz and Divam Gupta and Gyeongsik Moon and Kaiwen Guo and Yuan Dong and Yichen Xu and Takaaki Shiratori and Fabian Prada and Bernardo R. Pires and Bo Peng and Julia Buffalini and Autumn Trimble and Kevyn McPhail and Melissa Schoeller and Yaser Sheikh},
  title = {{Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars}},
  year = {2024},
  journal = {NeurIPS Track on Datasets and Benchmarks},
}

Ava-256#

The Ava-256 dataset is focused on the data required to create an avatar head, and then drive that avatar from an HMD (head-mounted display). A total of 256 participants were captured in both Mugsy, and with HMDs. Even though the captures sessions are only about 30 minutes, and only part of the data is stored, each capture is still around 10TB of data even with lossless compression. It ends up generating a dataset of over 3PB of data.

To enable the dataset to be realistically downloaded and used by organizations, like university researchers, multiple datasets were generated of varying sizes, from 4TB to 32TB. These are much more realistically downloadable, and the smallest can even be stored on a reasonably priced external hard drive.

Besides the raw images, the dataset also comes with universal encoders and decoders. These are able to take someone’s expression from within an HMD and then drive a generated avatar.

If you want more details on the Ava-256 dataset there are several resources:

Ava-256 Dataset Summary
Ava-256 GitHub
- Note that the GitHub does not have the dataset, but provides a script to download the varying datasets from AWS

Goliath-4#

While driving a head in VR is interesting, the holy grail is to have a whole body to be drivable in VR. The Goliath-4 dataset aims to provide the necessary data to generate this whole body avatar. This includes head captures, similar to Ava-256 but also adds captures in a larger collection system of the fully body with different clothing styles, hand scans, and phone captures. In total, a single participant generates over 750TB of data, even after lossless compression. Because of this, there are only 4 participants in this dataset. One other key difference between the Ava-256 and the Goliath-4 datasets is that the head and body captures are relightable. This means that instead of uniform lighting, there is a variable lighting pattern, enabling information about how light is cast on the subject to be determined.

Besides the raw data, this also includes some of the results, including face models, face encoders, relightable Gaussian heads, relightable hands, and full body decoders. All of this demonstrates the capabilities of the data, and how it can be used to create an immersive avatar for use in VR.

If you want more details on the Goliath-4 dataset, there are several resources:

Goliath-4 Dataset Summary
Goliath-4 GitHub
- Note that the GitHub does not have the rawdataset, but directions on who to contact to gain access to the data. This is handled differently than Ava-256.

Final Thoughts#

While I am no longer at Meta working on Codec Avatars, I still have a huge amount of pride in the work. I cannot wait to see Codec Avatars make it into a product. Having seen it on VR and AR headsets, I can speak to the impressive nature of them. I think they do have the chance to help bring us closer together in virtual worlds.

I am really thankful that I ended up being a co-author on the above mentioned paper, and that so many others were as well. It took an entire team to make these datasets a reality and being able to recognize those efforts I think is really important. Being able to share these datasets hopefully will enable researchers who do not have access to a massive multi-camera collection system to further their research. Those researchers may be the next people working on avatars and making our virtual worlds a little more personal and realistic.