Bridging Security Infrastructure Between the Data Center and AWS Lambda

Last week I was able to present at Black Hat USA about infrastructure security work supporting Lambda. Abstract and slides are available here. The talk is partially based on two posts on the Square developer blog: Providing mTLS Identities to Lambdas and Expanding Secrets Infrastructure to AWS Lambda.

Using Lambda extensions to accelerate Secrets Manager access

AWS Lambdas have recently been extended with a new feature that adds a runtime environment before a Lambda is executed: Lambda extesions. We have published a writeup on the Square Developer blog how we trialed this technology before it was generally available to prefetch secrets from Secrets Manager: Using AWS Lambda Extensions to Accelerate AWS Secrets Manager Access. This work has also been covered in the AWS Compute Blog.

Providing mutual TLS Identities to AWS Lambdas

AWS Lambdas have no built-in mechanism for mutual TLS identity - so at Square we built a system that issues SPIFFE-compatible identity to them so they can connect to our service mesh. The writeup is hosted on the Square Developer blog: Providing mutual TLS Identities to AWS Lambdas.

HotFuzz - Fuzzing Java Programs for Algorithmic Complexity Vulnerabilities

Overview

HotFuzz is a project in which we search for Algorithmic Complexity (AC) vulnerabilities in Java programs. These are vulnerabilities that can significantly slow down a program to exhaust its resources with small input. To use an analogy - it is easy to overwhelm a pizza place by ordering 100 pizzas at once, but can you think of a single pizza that would grind the restaurant's gears and bring operation to a halt?

We created a specialized fuzzer that ran on a customized JVM collecting resource measurements to guide programs towards worst-case behavior. We found 132 AC vulnerabilities in 47 top maven libraries and 26 AC vulnerabilities in the JRE including one in java.math that was reported and fixed as CVE-2018-1517.

This project is a product of working on the DARPA STAC program (Space/Time Analysis for Cybersecurity). I worked on the genetic algorithm implementation driving towards worst-case behavior, object instantiation, engagements, and paper writing. The paper "HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing" will be published at NDSS in San Diego this year. And yes, the paper is named after the 2007 classic Hot Fuzz.

Algorithmic complexity attacks

Effects of exploiting an AC vulnerability can be similar to a Denial of Service attack. However, these are often linear in behavior - the attacker sends lots of requests that overwhelm the target system. For this project we are interested in attacks where input and effect are disproportional. For example, making one request and causing a program to be stuck in a loop exhausting it's CPU. There is some related work such as SlowFuzz or PerfFuzz. However, we decided this topic needs further exploration.

System

As opposed to fuzzing a program's main entry point, we fuzz all methods treating them as potential entry points. To generate input values for method parameters we use Small Recursive Instantiation (SRI) based on type signature. We recursively instantiate objects which can be passed into the method invocation, for primitive variables we generate values randomly. However, since uniform distributions are not ideal for Java objects we use a custom distribution which is described in the paper. These form the initial population for our genetic algorithm where we use CPU utilization as fitness function driving towards worst-case consumption. This process emulates natural selection where entities in a population are paired and produce cross-over offspring with some mutations. Typically fuzzers do cross-over on bit level, extending this technique to objects with type hierarchies is a central component of micro-fuzzing.

As baseline to compare against we used another instantiation approach called Identity Value Instantiation (IVI). For example: 0 for integer, empty string for strings, etc. IVI is the simplest possible value selection strategy possible and merely used to assess effectiveness of SRI.

For measurement microFuzz workers contain a custom JVM that is called EyeVM. EyeVM is built on top of OpenJDK with a HotSpot VM modified for measuring precise CPU consumption.

Output from micro-fuzzing is passed on to witness synthesis and validation where a generated program is run and measured on an unmodified JVM. This verification step reduces false positives where fuzzing causes programs to hang, for example polling a socket or file. The synthesized program calls the target method with help from the reflection API and the Google GSON library, using wall-clock to measure runtime.

HotFuzz system overview leading from jar file to AC witnesses
HotFuzz system overview leading from jar file to AC witnesses

Results

We micro-fuzzed the JRE, the top 100 maven repositories, and challenges provided to us by DARPA. We found 132 AC vulnerabilities in 47 top maven libraries and 26 AC vulnerabilities in the JRE including one in java.math.

One finding that resulted in a CVE that was fixed by Oracle and IBM in their JRE implementations is CVE-2018-1517. HotFuzz found input to BigDecimal.add, which is an arbitrary precision library, that would lead to dramatically slowed down processing. In the case of IBM J9 we measured the process to be stuck for months. The cause for the slow down lies in how variables are allocated in the library. The bug and more results are presented in detail in the paper.

Other than comparing SRI with IVI fuzzing, a negative result that is not covered in the paper is that we also tried seeding fuzzing input with data from unit tests of libraries. These have not resulted in improved fuzzing results which was a surprise to us.

Summary

HotFuzz is a method-based fuzzer for Java programs that tries to drive input towards worst-case behavior in regards to CPU consumption. We found bugs in popular maven libraries and one CVE in java.math that lead to fixes by Oracle and IBM. The paper will be published at NDSS this year.

Pie Charts in the Land of Confusion: Of Cakes and Camemberts

Overview

The pie chart is a very successful way of displaying proportions. However, it was never properly named by it's inventor William Playwright.

It has different names in different languages, but apparently often based on food. Pie in English, Cake in German, Pizza in Portuguese, and Camembert in Frech. This sparked my interest what else is out there. To solve this, I made a short Google Cloud program to translate "pie chart" to 103 different languages and back. I found that there are more odd cases than Camembert, such as "cookies" in Hmong, "skin" in Samoan, or "chart chart" Kazakh. These results are established via Google Cloud Translate only and not verified by native speakers.

Depiction of a Camembert. Or a cake. What's the difference really? File from Wikipedia: https://en.wikipedia.org/wiki/Camembert#/media/File:Camembert_de_Normandie_(AOP)_11.jpg
Depiction of a Camembert. Or a pie. What's the difference really? File from Wikipedia: https://en.wikipedia.org/wiki/Camembert

Introduction

Pie charts are roughly 200 years old and are a very popular form of displaying proportions. Also, pie charts were not properly named when they were introduced.

Many of the names used for this chart around the world are based on food. For example, in German "pie chart" translates to "Tortendiagramm" which literally translated would be a "cake diagram". What prompted this investigation was this tweet. This post digs a bit into this phenomenon but I was curious whether I can go further. I thought there must be more to pie charts around the world.

Translations

As the year is 2019, all such investigations must involve the cloud. Therefore I used the Google Cloud Translate API to explore the depths of pie charts in 103 different languages.

In a first step I checked how the German and French case translate back to English. Theoretically, a flawless translation would go from "pie chart" to ? to "pie chart". This is called round-trip translation and considered a controversial quality metric for automated translation.

However, a literal translation would lose the original meaning and possibly expose different meanings. This is through limit of context - the translation engine makes a best effort picking out of multiple possibilities. While "camembert" translates back as "camembert", by adding more context translating "draw me a pie chart" translates to French as "me dessiner un camembert" and back identically. However, these literal translations due to lack of context are great for the purpose of this project.

To verify this briefly I translated the word to French and German - and back. Google Cloud Translation results in "Cake chart" and "Camembert". (The code for this is in the repository, function "de_fr_only".) As this worked out I went on to translate "pie chart" from English to 103 available languages, and back. The code is available here.

A note on limitations: I only receive one result per translation, languages could have several expressions for pie charts, I will only see one. Furthermore, the translations might simply be imperfect.

Results

Google cloud translation API supports 103 languages other than English. I wrote this program to translate "pie chart" to all of these languages and back.

The results are in the file all_the_pies.json, which is a JSON dictionary with two keys: "pie_to_foreign", and "foreign_to_pie". Each element contains a list "from" and "to", with the language indicator and the expression. This data structure is more verbose than required, a dictionary indexed by two-letter language code would have been sufficient for each. I did it this way as I wasn't sure what I want to do with the data later, and 103 languages are not all that much in the grand scheme of things.

When looking at the data, I noticed 74/103 of these are pie based, however, only 67 are "pie chart". Greek doubles the effort with "pie pie", Bengali has a "pie image", Portuguese is hedging bets with "pizza pie". Furthermore, "board pie": Haitian Creole, "graphic pie": Corsican, but most on-point is Afrikaans: "pie". 4/103 are cake based, however: 2x "diagram", 1x "chart" and 1x "table". I guess we could say that the pie chart takes the cake!

Other than cakes and pies or previously discussed examples, these 24 are different than the others.

  • card - Estonian
  • cart shape - Yoruba
  • chart chart - Kazakh
  • circular chart - Vietnamese
  • circular diagram - Bulgarian, Persian, Slovenian
  • circular graph - Croatian, Galician
  • diagram - Basque
  • drawing - Hausa
  • glassy table - Tajik
  • Graph of proportions - Romanian
  • Hidden chart - Pashto
  • organizer - Igbo
  • papa map - Maori
  • pastry photo - Azerbaijani
  • Pirate Chat - Sinhala
  • Point plan - Luxembourgish
  • skin - Samoan
  • table - Somali
  • table of the patient - Swahili
  • tablet - Hawaiian
  • the cookies - Hmong

Notably only "pastry photo" and "the cookies" are food based, all others are not related to food. I.e. 22/104 languages supported in Google Translate use an analogy not related to food for their expression of pie charts. While running this set me back 12 cents in cloud compute cost, I would argue that this insight was worth every penny!

Closing Thoughts

The majority of languages seems to relate pie charts to food, and within that mostly to pies. There are several notable exceptions that might seem obscure to English speakers. However, it remains an open question whether providing a proper name by Playwright for his creation 200 years ago would have lead to a less diverse naming situation for this chart. Maybe a good takeaway is to name inventions and systems straight away, as opposed to letting others name them.

As some might notice, this post lacks any actual pie charts. If you have been reading until here to see a pie chart, you have been tricked!