Unions and default value in apache avro serialization and deserialization

Initial avro schema (schema/user.avsc) defines a User record with a name field only.

  "namespace": "com.bawi.avro.model",
  "type": "record",
  "name": "User",
  "fields": [
      "name": "name",
      "type": "string"

Maven pom.xml defines avro dependency


so we can serialize the User data in Java to disc to user.avro file

        Schema schema = new Schema.Parser().parse(new File("schema/user.avsc"));
        File avroFile = new File("target/user.avro");
        GenericRecord user = new GenericData.Record(schema);
        user.put("name", "Alyssa");
        DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
        DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
        dataFileWriter.create(schema, avroFile);

we can read (deserialize) User from the disc either by Java

        DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
        DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader);
        GenericRecord user = null;
        while (dataFileReader.hasNext()) {
            user = dataFileReader.next(user);

or by using avro-utils jar that can be downloaded by maven when declared maven test dependency:


and running with ‘tojson’ argument

me@MacBook:~/dev/my-projects/my-avro$ java -jar /Users/me/.m2/repository/org/apache/avro/avro-tools/1.8.1/avro-tools-1.8.1.jar tojson users.avro 

Then we will add a new favorite_number element to the schema:

  "namespace": "com.bawi.avro.model",
  "type": "record",
  "name": "User",
  "fields": [
      "name": "name",
      "type": "string"
      "name": "favorite_number",
      "type": "int"

and run the deserialization Java code for existing data in the user.avro but against the new schema, then we get:

Exception in thread "main" org.apache.avro.AvroTypeException: Found com.bawi.avro.model.User, expecting com.bawi.avro.model.User, missing required field favorite_number

since the favorite_number does not exist in avro file.

Adding only a union of int and null value does not help to get rid the error above.

The solution is to add a default value with a union for favorite_number e.g.:

      "name": "favorite_number",
      "type": [
      "default": null

to get: {“name”: “Alyssa”, “favorite_number”: null}
or add

      "name": "favorite_number",
      "type": "int",
      "default": 0

to get: {“name”: “Alyssa”, “favorite_number”: 0}

Please note that placing int as first argument of a union and having null as default value such as:

      "name": "favorite_number",
      "type": [
      "default": null

gives an error:

Exception in thread "main" org.apache.avro.AvroTypeException: Non-numeric default value for int: null


Exception in thread "main" org.apache.avro.AvroTypeException: Non-null default value for null type: 0


      "name": "favorite_number",
      "type": [
      "default": 0

as described in https://avro.apache.org/docs/1.7.7/spec.html#Unions

curl download file from ftp with user and password

me@MacBook:~/Downloads$ curl -OLv -u myusername:mypassword ftp://myftphost/myfile
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying myftpip...
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to myftphost (myftpip) port 21 (#0)
< 220 FTP server ready
> USER myusername
< 331 Password required for myusername
> PASS mypassword
< 230 User myusername logged in
< 257 "/" is the current directory
* Entry path is '/'
* Connect data stream passively
* ftp_perform ends with SECONDARY: 0
< 229 Entering Extended Passive Mode (|||44363|)
* Trying myftpip...
* Connecting to myftpip (myftpip) port 44363
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0* Connected to myftphost (myftpip) port 21 (#0)
< 200 Type set to I
> SIZE myfile
< 213 173938503
> RETR myfile
< 150 Opening BINARY mode data connection for myfile (173938503 bytes)
* Maxdownload = -1
* Getting file with size: 173938503
{ [1448 bytes data]
99 165M 99 164M 0 0 3881k 0 0:00:43 0:00:43 --:--:-- 3925k* Remembering we are in dir ""
< 226 Transfer complete
100 165M 100 165M 0 0 3885k 0 0:00:43 0:00:43 --:--:-- 4004k
* Connection #0 to host myftphost left intact

My text editors for xml and json – sublime and notepad++

Notepad++ – install version 32 bit since 64bit does not support plugin manager and some other plugins. Then go to plugins/plugin manager/ install: xml tools and go to ‘?’ to go to Get more plugins and search for ‘Json Viewer’, download the zip file, extract it and place dll file to C:/Program Files(x64)/Notepad++/plugins dir

Sublime Text editor 3:  ensure that ‘Package Control’ is installed by going to https://packagecontrol.io/installation, go to View -> console, copy and paste text, restart sublime, then go to Preferences/Package Control, and type ‘install package’, hit enter, type xpath for list of plugins and choose ‘xpath’ and ‘Indent-xml’

grep egrep sed match replace part of the text

svn log $url -r {2016-12-01}:{2016-12-30} --search Bartosz
r324 | bartosz@mycompany.com | 2016-12-14 17:11:44 +0100 (Wed, 14 Dec 2016) | 1 line

created mybranch

expected output:

http://svn.dev.mycompany.com/svn/myproject/branches/mybranch/?p=123 | bartosz@mycompany.com | 2016-12-14 17:11:44 +0100 (Wed, 14 Dec 2016)
svn log $url -r {2016-12-01}:{2016-12-30} --search Bartosz | egrep '^r[0-9]+' | sed "s|\(^r\)\([0-9]*\)|$url/?p=\2|" | sed 's/ | [0-9]* line.*$//'


egrep '^r[0-9]+'

is equivalent to extended grep -E and matches begging of line with rRev e.g r123

sed "s|\(^r\)\([0-9]*\)|$url/?p=\2|"

– with ‘/’ replaced by ‘|’ and taking only revision number (skipping ‘r’ in the beginning) and adding the url: ‘r123’ -> ‘http://svn.dev.mycompany.com/svn/myproject/branches/mybranch/?p=123&#8217;

sed 's/ | [0-9]* line.*$//'

– removes last part of message e.g.: ‘ | 1 line’

Docker quickstart cloudera

me@MacBook:~$ docker pull cloudera/quickstart:latest
latest: Pulling from cloudera/quickstart
1d00652ce734: Pull complete
Digest: sha256:f91bee4cdfa2c92ea3652929a22f729d4d13fc838b00f120e630f91c941acb63
Status: Downloaded newer image for cloudera/quickstart:latest

docker images

docker run –hostname=quickstart.cloudera –privileged=true -t -i –hostname=quickstart.cloudera –privileged=true -t -i -p 8888:8888 -p80:80 -p7180:7180 –name quickstart.cloudera -d 4239cd2958c6 /usr/bin/docker-quickstart

me@MacBook:~/dev/env$ docker ps
da0050c427c3 4239cd2958c6 “/usr/bin/docker-quic” 3 hours ago Up 3 hours>80/tcp,>7180/tcp,>8888/tcp quickstart.cloudera

docker attach quickstart.cloudera

ctrl + p + ctrl + q (detach the tty)

docker start quickstart.cloudera

me@MacBook:~$ docker stop -t 120 quickstart.cloudera


Backup rsync

1. Initial source folder structure:

me@MacBook:~/tmp$ find source

2. Lets backup source folder to destination folder using rsync

me@MacBook:~/tmp$ rsync -abvP  --backup-dir=backup_`date +%Y-%m-%d--%H-%M-%S` --include=*.orig.vhd --exclude={/backup_*,.DS_Store,*.vhd} --delete source/ destination/
building file list ... 
6 files to consider
created directory destination
           1 100%    0.00kB/s    0:00:00 (xfer#1, to-check=4/6)
           1 100%    0.98kB/s    0:00:00 (xfer#2, to-check=3/6)
           1 100%    0.98kB/s    0:00:00 (xfer#3, to-check=1/6)
           1 100%    0.98kB/s    0:00:00 (xfer#4, to-check=0/6)

sent 343 bytes  received 120 bytes  926.00 bytes/sec
total size is 4  speedup is 0.01



do not exclude files *.orig.vhd for any source (sub)directories (since include is before exclude then inclusion will first match rather then exclusion from *.vhd)


copy all files excluding ~/tmp/backup_* files, excluding all .DS_Store and *.vhd files in any (sub)directories

3. The destination directory content:

me@MacBook:~/tmp$ find destination

4. Lets modify one file and delete another and then run the same rsync command:

me@MacBook:~/tmp$ echo "2" > source/a 
me@MacBook:~/tmp$ rm source/dev/x
me@MacBook:~/tmp$ rsync -abvP  --backup-dir=backup_`date +%Y-%m-%d--%H-%M-%S` --include=*.orig.vhd --exclude={/backup_*,.DS_Store,*.vhd} --delete source/ destination/
building file list ... 
5 files to consider
deleting dev/x
           2 100%    0.00kB/s    0:00:00 (xfer#1, to-check=3/5)

sent 195 bytes  received 54 bytes  498.00 bytes/sec
total size is 4  speedup is 0.02
--backup-dir=backup_`date +%Y-%m-%d--%H-%M-%S`

creates a backup folder in destination dir with previous versions of modified or deleted files


deletes the file in destination folder if they were deleted in the source dir

5. Then the destination folder will contain backup_2016-10-29–12-26-06 with previous version of a and dev/x before deletion:

me@MacBook:~/tmp$ find destination

Mac shortcuts

cmd+W – close window
cmd+Q – quit window
cmd+tab – switch windows

alt + cmd + space – open finder
cmd + shift + m – toggle zoom: maximize/minimize window
cmd + m – minimize application to dock icon (un-minize – select app with cmd+tab, while pressing cmd, start pressing alt and release cmd)
F5 for Reload This Page – System Preferences/Keyboad/Shortcuts/App Shortcuts/+ to add Google Chrome application, title Reload This Page/shortcut F5
F11 – show desktop
F3 – show mission control (all windows) or 3 fingers swipe up
ctrl+ down arrow – show Application Windows for (App Expose) or 3 fingers swipe down
ctrl + up arrow – show Mission control
ctrl + F8 + down arrow + down arrow + enter – lock screen (Utilities/KeyChain Access/Preferences/General: check: Show keychain status in menu bar)

alt + right/left arrow – move one word forward / backward
ctrl + A / E (or fn + shift + left/right arrow) – move to the begin / end of line

TextEdit/Browser (alt and cmd):
alt + right/left arrow – move cursor one word forward/backward
cmd + right/left arrow – move to end/begin of line (also ctrl + right/left when disabled remapped Mission Control)
cmd + up/down arrow  – move cursor to home/end
fn+right/left arrow – go to home/end
fn+up/down arrow – page up/down (curson does not move)
shift + alt +right/left – select one word from cursor to right/left
shift + cmd + right/left – select words from cursor to end/begin of line
cmd + c / v / x – copy / paste / remove
ctrl + tab – move between tabs in chrome
cmd + alt + shift + v – paste formatted text but adjust to the format of target text

Vim (fn and ctrl, does not use cmd so be able use the same keys in Windows/Linux):
fn + right/left – move to the begin/end of line (fn since no home/end button)
ctrl + right/left – move to the one word forward/backward (with disabled/remapped Mission Control)
cmd + shift + v – paste selected text (needs to be selected via mouse)

My Intellij (fn and ctrl, does not use cmd so be able use the same keys in Windows/Linux):
fn + right/left – move to the begin/end of line (fn since no home/end button)
ctrl + right/left – move one word forward/backward (with disabled/remapped Mission Control)
shift + fn + right/left – select words from curson to the begin/end of line
shift + ctrl + right/left – select one word forward/backward (with disabled/remapped Mission Control)
ctrl + c / v / x – copy / paste / remove


    1. Turn off AutoCorrect: Apple Menu > System Preferences > Keyboard > Text and turn  Autocorrect off by unchecking Correct spelling automatically
    2. Add ssh autocompletion for bash by appending ~/.bash_profile or ~/.bashrc:
_complete_ssh_hosts ()
#        cat ~/.ssh/known_hosts | \
#        cut -f 1 -d ' ' | \
#        sed -e s/,.*//g | \
#        grep -v ^# | \
#        uniq | \
#        grep -v "\[" ;
        cat ~/.ssh/config | \
        grep "^Host " | \
        awk '{print $2}'
    COMPREPLY=( $(compgen -W "${comp_ssh_hosts}" -- $cur))
    return 0
complete -F _complete_ssh_hosts ssh
    1. Terminal prompt structure and colors:
export PS1="\[\033[01;32m\]\u@MacBook\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ "
export CLICOLOR=1
export LSCOLORS=ExFxBxDxCxegedabagacad
alias ls='ls -Gh'

brew install svn (append .bash_profile: export PATH=/usr/local/bin:${PATH})
brew install maven
brew install cntlm

Settings -> Keyboard/Text: disable use smart quotes and dashes

Finder -> Preferences/Advanced/Show all filename extensions