Tag: apache-avro

Unions and default value in apache avro serialization and deserialization

Initial avro schema (schema/user.avsc) defines a User record with a name field only.

{
  "namespace": "com.bawi.avro.model",
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    }
  ]
}

Maven pom.xml defines avro dependency

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.8.1</version>
        </dependency>

so we can serialize the User data in Java to disc to user.avro file

        Schema schema = new Schema.Parser().parse(new File("schema/user.avsc"));
        File avroFile = new File("target/user.avro");
        GenericRecord user = new GenericData.Record(schema);
        user.put("name", "Alyssa");
        DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
        DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
        dataFileWriter.create(schema, avroFile);
        dataFileWriter.append(user);
        dataFileWriter.close();

we can read (deserialize) User from the disc either by Java

        DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
        DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader);
        GenericRecord user = null;
        while (dataFileReader.hasNext()) {
            user = dataFileReader.next(user);
            System.out.println(user);
        }

or by using avro-utils jar that can be downloaded by maven when declared maven test dependency:

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-tools</artifactId>
            <version>1.8.1</version>
            <scope>test</scope>
        </dependency>

and running with ‘tojson’ argument

me@MacBook:~/dev/my-projects/my-avro$ java -jar /Users/me/.m2/repository/org/apache/avro/avro-tools/1.8.1/avro-tools-1.8.1.jar tojson users.avro 
{"name":"Alyssa"}

Then we will add a new favorite_number element to the schema:

{
  "namespace": "com.bawi.avro.model",
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "favorite_number",
      "type": "int"
    }
  ]
}

and run the deserialization Java code for existing data in the user.avro but against the new schema, then we get:

Exception in thread "main" org.apache.avro.AvroTypeException: Found com.bawi.avro.model.User, expecting com.bawi.avro.model.User, missing required field favorite_number

since the favorite_number does not exist in avro file.

Adding only a union of int and null value does not help to get rid the error above.

The solution is to add a default value with a union for favorite_number e.g.:

    {
      "name": "favorite_number",
      "type": [
        "null",
        "int"
      ],
      "default": null
    }

to get: {“name”: “Alyssa”, “favorite_number”: null}
or add

    {
      "name": "favorite_number",
      "type": "int",
      "default": 0
    }

to get: {“name”: “Alyssa”, “favorite_number”: 0}

Please note that placing int as first argument of a union and having null as default value such as:

    {
      "name": "favorite_number",
      "type": [
        "int",
        "null"
      ],
      "default": null
    }

gives an error:

Exception in thread "main" org.apache.avro.AvroTypeException: Non-numeric default value for int: null

or

Exception in thread "main" org.apache.avro.AvroTypeException: Non-null default value for null type: 0

when

    {
      "name": "favorite_number",
      "type": [
        "null",
        "int"
      ],
      "default": 0
    }

as described in https://avro.apache.org/docs/1.7.7/spec.html#Unions