Author: Bartosz Wieczorek

Zdrowie

SEN

  • isc spac o 22, a na pewno przed 23, godzina snu przed polnoca wiecej warta niz 2 godziny snu po polnocy, ok polnocy regeneruje sie serce, im pozniej spac tym trudniej zasnac bo w organizmie budza sie procesy
  • dla zasniecia: kapiel, prysznic, umyc twarz, przygaszone swiatlo wczesniej, odlozyc komorki, wsluchac sie w oddech, zrobic spokojne glebokie oddechy, wspominac 5 milych rzeczy z przeszlosci a nie myslec co trzeba zrobic, zapisac na kartce przed spaniem co zrobic nastepnego dnia zeby umysl nie musial o tym myslec

 

Advertisements

my map reduce word count job

Get Hadoop version:

[bdaldr@bdaolc011node18 hadoop]$ hadoop version
Hadoop 2.6.0-cdh5.10.1
Subversion http://github.com/cloudera/hadoop -r b97747d0d68a45b4833fb0826949a6ae5bc698a6
Compiled by jenkins on 2017-03-20T09:39Z
Compiled with protoc 2.5.0
From source with checksum 892f9d53dc9838c14aae81b2addfdd90
This command was run using /opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/hadoop-common-2.6.0-cdh5.10.1.jar

Java map reduce code:

package com.bawi.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        public void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {
        // for windows we need to set hadoop.home.dir to parent dir of bin/winutils.exe
        if (System.getProperty("os.name").toLowerCase().contains("windows")) {
            System.setProperty("hadoop.home.dir", System.getProperty("user.dir"));
        }

        Configuration conf = new Configuration();
        // conf.set("fs.defaultFS", "hdfs://localhost:8020");
        Job job = Job.getInstance(conf, "wordcount");

        job.setJarByClass(WordCount.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }

}

pom.xml (with Hadoop 2.6.0-cdh5.10.1)

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

   <modelVersion>4.0.0</modelVersion>
   <groupId>com.bawi</groupId>
   <artifactId>my-hadoop-word-count</artifactId>
   <version>0.0.1-SNAPSHOT</version>

   <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      <hadoop-cloudera.version>2.6.0-cdh5.10.1</hadoop-cloudera.version>
   </properties>

   <dependencies>
      <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-common</artifactId>
         <version>${hadoop-cloudera.version}</version>
      </dependency>

      <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapreduce-client-common</artifactId>
         <version>${hadoop-cloudera.version}</version>
      </dependency>

   </dependencies>

   <build>
      <plugins>
         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <version>3.0.2</version>
            <configuration>
               <archive>
                  <manifest>
                     <addClasspath>true</addClasspath>
                     <mainClass>com.bawi.hadoop.WordCount</mainClass>
                  </manifest>
               </archive>
            </configuration>
         </plugin>
         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.7.0</version>
            <configuration>
               <source>1.8</source>
               <target>1.8</target>
            </configuration>
         </plugin>
      </plugins>
   </build>

   <repositories>
      <repository>
         <!-- Central Repository -->
         <id>central</id>
         <url>http://repo1.maven.org/maven2/</url>
         <releases>
            <enabled>true</enabled>
         </releases>
         <snapshots>
            <enabled>true</enabled>
         </snapshots>
      </repository>
      <repository>
         <!-- Cloudera Repository -->
         <id>cloudera</id>
         <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
         <releases>
            <enabled>true</enabled>
         </releases>
         <snapshots>
            <enabled>true</enabled>
         </snapshots>
      </repository>
   </repositories>

</project>

Execution:

[me@node01 bartek]$ find
../output
./my-hadoop-word-count-0.0.1-SNAPSHOT.jar
./input
./input/file3.txt
./input/file1.txt
./input/file2.txt

[me@node01 bartek]$ cat input/file1.txt 
Hello Haddop file1
Bartek
[me@node01 bartek]$ cat input/file2.txt 
Hello file2
file2
[me@node01 bartek]$ cat input/file3.txt 
file3
file3
file3

[me@node01 bartek]$ hadoop fs -mkdir bartek
[me@node01 bartek]$ hadoop fs -mkdir bartek/input
[me@node01 bartek]$ hadoop fs -copyFromLocal input/* bartek/input/
[me@node01 bartek]$ hadoop fs -ls bartek/input/
Found 3 items
-rw-r--r-- 2 me me 26 2017-12-15 11:48 bartek/input/file1.txt
-rw-r--r-- 2 me me 18 2017-12-15 11:48 bartek/input/file2.txt
-rw-r--r-- 2 me me 19 2017-12-15 11:49 bartek/input/file3.txt

[me@node01 bartek]$ hadoop jar my-hadoop-word-count-0.0.1-SNAPSHOT.jar bartek/input/ bartek/output
17/12/15 11:49:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm727
17/12/15 11:49:52 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/12/15 11:49:52 INFO input.FileInputFormat: Total input paths to process : 3
17/12/15 11:49:52 INFO mapreduce.JobSubmitter: number of splits:3
17/12/15 11:49:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510690151596_34067
17/12/15 11:49:53 INFO impl.YarnClientImpl: Submitted application application_1510690151596_34067
17/12/15 11:49:53 INFO mapreduce.Job: The url to track the job: http://node02.sabre.com:8088/proxy/application_1510690151596_34067/
17/12/15 11:49:53 INFO mapreduce.Job: Running job: job_1510690151596_34067
17/12/15 11:50:00 INFO mapreduce.Job: Job job_1510690151596_34067 running in uber mode : false
17/12/15 11:50:00 INFO mapreduce.Job: map 0% reduce 0%
17/12/15 11:50:06 INFO mapreduce.Job: map 100% reduce 0%
17/12/15 11:50:13 INFO mapreduce.Job: map 100% reduce 10%
17/12/15 11:50:14 INFO mapreduce.Job: map 100% reduce 24%
17/12/15 11:50:15 INFO mapreduce.Job: map 100% reduce 32%
17/12/15 11:50:16 INFO mapreduce.Job: map 100% reduce 93%
17/12/15 11:50:18 INFO mapreduce.Job: map 100% reduce 100%
17/12/15 11:50:19 INFO mapreduce.Job: Job job_1510690151596_34067 completed successfully
17/12/15 11:50:19 INFO mapreduce.Job: Counters: 51
 File System Counters
 FILE: Number of bytes read=2342
 FILE: Number of bytes written=15021180
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=417
 HDFS: Number of bytes written=50
 HDFS: Number of read operations=345
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=224
 Job Counters 
 Killed reduce tasks=1
 Launched map tasks=3
 Launched reduce tasks=112
 Data-local map tasks=2
 Rack-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=54376
 Total time spent by all reduces in occupied slots (ms)=5933048
 Total time spent by all map tasks (ms)=13594
 Total time spent by all reduce tasks (ms)=741631
 Total vcore-seconds taken by all map tasks=13594
 Total vcore-seconds taken by all reduce tasks=741631
 Total megabyte-seconds taken by all map tasks=27840512
 Total megabyte-seconds taken by all reduce tasks=3037720576
 Map-Reduce Framework
 Map input records=8
 Map output records=10
 Map output bytes=92
 Map output materialized bytes=5478
 Input split bytes=354
 Combine input records=0
 Combine output records=0
 Reduce input groups=6
 Reduce shuffle bytes=5478
 Reduce input records=10
 Reduce output records=6
 Spilled Records=20
 Shuffled Maps =336
 Failed Shuffles=0
 Merged Map outputs=336
 GC time elapsed (ms)=17346
 CPU time spent (ms)=145210
 Physical memory (bytes) snapshot=39383625728
 Virtual memory (bytes) snapshot=424355987456
 Total committed heap usage (bytes)=131861577728
 Shuffle Errors
 BAD_ID=0
 CONNECTION=0
 IO_ERROR=0
 WRONG_LENGTH=0
 WRONG_MAP=0
 WRONG_REDUCE=0
 File Input Format Counters 
 Bytes Read=63
 File Output Format Counters 
 Bytes Written=50
[me@node01 bartek]$ hadoop fs -cat bartek/input/file1.txt
Hello Haddop file1
Bartek
[me@node01 bartek]$ hadoop fs -cat bartek/input/file2.txt
Hello file2
file2
[me@node01 bartek]$ hadoop fs -cat bartek/input/file3.txt
file3
file3
file3

[me@node01 bartek]$ hadoop fs -ls bartek/output
Found 113 items
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/_SUCCESS
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00000
...
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00010
-rw-r--r-- 2 me me 9 2017-12-15 11:50 bartek/output/part-r-00011
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00012
...
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00035
-rw-r--r-- 2 me me 8 2017-12-15 11:50 bartek/output/part-r-00036
-rw-r--r-- 2 me me 8 2017-12-15 11:50 bartek/output/part-r-00037
-rw-r--r-- 2 me me 8 2017-12-15 11:50 bartek/output/part-r-00038
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00039
...
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00103
-rw-r--r-- 2 me me 9 2017-12-15 11:50 bartek/output/part-r-00104
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00105
...
-rw-r--r-- 2 me me 0 2017-12-15 11:50 bartek/output/part-r-00111

[me@node01 output]$ hadoop fs -getmerge bartek/output/part* result.txt
[me@node01 output]$ cat result.txt 
Haddop 1
Hello 2
file1 1
file2 2
file3 3
Bartek 1

Statistics:

Map Tasks for job_1510690151596_34067
ID                                State     Start Time                     Finish Time                    Elapsed Time Start Time             Finish Time                    Elapsed Time
task_1510690151596_34067_m_000002 SUCCEEDED Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec
task_1510690151596_34067_m_000001 SUCCEEDED Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec
task_1510690151596_34067_m_000000 SUCCEEDED Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec Fri Dec 15 18:50:00 +0100 2017 Fri Dec 15 18:50:05 +0100 2017 4sec

syslog (part) from container:

2017-12-15 11:50:05,097 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://node01/user/me/bartek/input/file1.txt:0+26

 

Job Name: wordcount
User Name: me
Queue: root.me
State: SUCCEEDED
Uberized: false
Submitted: Fri Dec 15 11:49:53 CST 2017
Started: Fri Dec 15 11:49:58 CST 2017
Finished: Fri Dec 15 11:50:17 CST 2017
Elapsed: 19sec
Diagnostics:
Average Map Time 4sec
Average Shuffle Time 6sec
Average Merge Time 0sec
Average Reduce Time 0sec
Task Type Total Complete
Map 3 3
Reduce 112 112
Attempt Type Failed Killed Successful
Maps 0 0 3
Reduces 0 0 112

Full logs from containers are available:

[me@node01 ~]$ yarn logs -applicationId application_1510690151596_34067 -appOwner me

 

Container: container_e166_1510690151596_34067_01_000001 on bdaolc011node15.sabre.com_8041

2017-12-15 11:49:56,119 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1510690151596_34067_000001

2017-12-15 11:49:57,724 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1510690151596_34067 = 63. Number of splits = 3
2017-12-15 11:49:57,733 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1510690151596_34067 = 112
2017-12-15 11:49:58,562 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1510690151596_34067Job Transitioned from SETUP to RUNNING
2017-12-15 11:49:58,603 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1510690151596_34067, File: hdfs://node01:8020/user/me/.staging/job_1510690151596_34067/job_1510690151596_34067_1.jhist
2017-12-15 11:49:58,713 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1510690151596_34067_m_000000 Task Transitioned from NEW to SCHEDULED

2017-12-15 11:50:00,584 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
2017-12-15 11:50:00,586 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e166_1510690151596_34067_01_000002 to attempt_1510690151596_34067_m_000001_0
2017-12-15 11:50:00,587 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e166_1510690151596_34067_01_000003 to attempt_1510690151596_34067_m_000002_0
2017-12-15 11:50:00,587 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e166_1510690151596_34067_01_000004 to attempt_1510690151596_34067_m_000000_0

2017-12-15 11:50:05,331 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1510690151596_34067_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-12-15 11:50:05,331 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 3
2017-12-15 11:50:05,594 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:112 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:3 AssignedReds:0 CompletedMaps:3 CompletedReds:0 ContAlloc:3 ContRel:0 HostLocal:2 RackLocal:1
2017-12-15 11:50:05,595 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:258662, vCores:92>
2017-12-15 11:50:05,595 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold reached. Scheduling reduces.
2017-12-15 11:50:05,595 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assigned. Ramping up all remaining reduces:112

2017-12-15 11:50:07,611 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 112
2017-12-15 11:50:07,612 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2017-12-15 11:50:07,612 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e166_1510690151596_34067_01_000005 to attempt_1510690151596_34067_r_000000_0
2017-12-15 11:50:07,613 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2017-12-15 11:50:07,613 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e166_1510690151596_34067_01_000006 to attempt_1510690151596_34067_r_000001_0

2017-12-15 11:50:17,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1510690151596_34067_r_000049_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
2017-12-15 11:50:17,815 INFO [Thread-200] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to 
2017-12-15 11:50:17,817 INFO [Thread-200] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://node01:19888/jobhistory/job/job_1510690151596_34067
2017-12-15 11:50:17,821 INFO [Thread-200] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
2017-12-15 11:50:18,823 INFO [Thread-200] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:6 CompletedMaps:3 CompletedReds:112 ContAlloc:116 ContRel:0 HostLocal:2 RackLocal:1
2017-12-15 11:50:18,823 INFO [Thread-200] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://node01/user/me/.staging/job_1510690151596_34067

 

[me@node01 ~]$ yarn logs -applicationId application_1510690151596_34067 -appOwner me | grep Container: | wc -l
116

(1 main + 3 mappers + 112 reducers)

We can change number of reducers in code by:

job.setNumReduceTasks(10);

to get:

 Job Counters 
 Launched map tasks=3
 Launched reduce tasks=10

If we increase input text file size to 1.3GB which is above the hadoop file size (256MB) defined in http://node01:50070/conf (active master):

<property>
    <name>dfs.blocksize</name>
    <value>268435456</value>
    <source>hdfs-site.xml</source>
</property>

then we get 5 input splits for a single file (1.3GB / 256 = 5):

 Job Counters 
 Launched map tasks=5
 Launched reduce tasks=115

Note for mapper ir reducer task we may have speculative execution:

attempt_1510690151596_37594_r_000017_0 SUCCEEDED reduce
attempt_1510690151596_37594_r_000017_1 KILLED
Speculation: attempt_1510690151596_37594_r_000017_0 succeeded first!

zookeeper and kafka on vpn changing bind port from 0.0.0.0

me@MacBook:~$ vim update-ip-to-hostname-in-etc-host.sh
#!/bin/bash
ip=$(ifconfig | grep 'inet\ ' | grep -v 127.0.0.1 | awk '{ print $2 }' | tail -n 1)
if [ "$ip" == "" ]; then
ip=127.0.0.1;
fi
echo "Updating IP HOSTNAME mapping to '$ip' '$HOSTNAME' in /etc/hosts ..."
sudo sed -i "" "s/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\ \ \ $HOSTNAME/$ip\ \ \ $HOSTNAME/g" /etc/hosts
echo "Results for $HOSTNAME:"
cat /etc/hosts | grep $HOSTNAME

Results:

me@MacBook:~$ ./update-ip-to-hostname-in-etc-host.sh  
Updating IP HOSTNAME mapping to '10.162.224.245'   'C02S53DRG8WM' in /etc/hosts ...
Results for C02S53DRG8WM:
#10.162.224.245   C02S53DRG8WM #export SPARK_LOCAL_IP=10.162.209.59 or SPARK_LOCAL_IP="127.0.0.1"
10.162.224.245   C02S53DRG8WM

me@MacBook:~/dev/env/kafka_2.11-0.11.0.0$ vim config/zookeeper.properties
# Added
clientPortAddress=MacBook

Results:

[2017-09-08 16:08:14,726] INFO binding to port C02S53DRG8WM/10.162.224.245:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-09-08 16:08:20,801] INFO Accepted socket connection from /10.162.224.245:50432 (org.apache.zookeeper.server.NIOServerCnxnFactory)
...
[2017-09-08 16:08:20,831] INFO Established session 0x15e61d1d3990000 with negotiated timeout 6000 for client /10.162.224.245:50432 (org.apache.zookeeper.server.ZooKeeperServer)

me@MacBook:~/dev/env/kafka_2.11-0.11.0.0$ vim config/server.properties
#zookeeper.connect=localhost:2181
zookeeper.connect=MacBook:2181

Results:

[2017-09-08 16:08:20,801] INFO Socket connection established to C02S53DRG8WM/10.162.224.245:2181, initiating session (org.apache.zookeeper.ClientCnxn)
...
[2017-09-08 16:08:21,340] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

Remapping winkey windows key when running Windows and Ubuntu in VirtualBox on Mac OSX

If you are used to Mac keyboard and you want to keep using command key for copy, paste, select etc.. also while running Windows or Ubuntu in VirtualBox then you could remap left windows key to left control key by running for:

  1. Ubuntu
    gsettings set org.gnome.desktop.input-sources xkb-options "['altwin:ctrl_win']"
    and optionally set terminal as well
    gsettings set org.gnome.Terminal.Legacy.Keybindings:/org/gnome/terminal/legacy/keybindings/ paste '<Primary>v'
  2. Windows
    by adding registry entry:
    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout]
    "Scancode Map"=hex:00,00,00,00,00,00,00,00,02,00,00,00,1d,00,5b,e0,00,00,00,00
    based on https://superuser.com/questions/1190329/can-i-switch-the-alt-and-ctrl-keys-on-my-keyboard

and restarting the Ubuntu or Windows.

xsd schema – repeating elements via unbouned choice and sequence

For the schema below with choice:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.bawi.com/schema" targetNamespace="http://www.bawi.com/schema" elementFormDefault="qualified">
    <xs:element name="MyRootElement">
        <xs:complexType>

	    <!-- allow either (zero or more of A) or (only exactly single B) -->	
            <xs:choice>
                <xs:element name="A" maxOccurs="unbounded"/>
                <xs:element name="B" />
            </xs:choice>
        </xs:complexType>
    </xs:element>
</xs:schema>

Screen Shot 2017-08-07 at 13.33.10

we get valid xmls:
– no children

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
</MyRootElement>

– zero or more A element(s)

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <A></A>
  <A/>
</MyRootElement>

– exactly one B element

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <B/>
</MyRootElement>

However, combination of A and B is not possible:

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <A></A>
  <B/>
</MyRootElement>

as we get

Invalid content was found starting with element 'B'. One of '{"http://www.bawi.com/schema":A}' is expected.

and also not valid xml:

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <B/>
  <A></A>
</MyRootElement>

due to

Invalid content was found starting with element 'A'. No child element is expected at this point.

In order to get xml with many A elements and single B we would define a sequence:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.bawi.com/schema" targetNamespace="http://www.bawi.com/schema" elementFormDefault="qualified">
    <xs:element name="MyRootElement">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="A" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element name="B" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

Screen Shot 2017-08-07 at 13.47.48

so that the xml below is valid:

<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <A></A>
  <B/>
</MyRootElement>

For validation I have used the following code:

import org.xml.sax.SAXException;

import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import java.io.File;
import java.io.IOException;

public class MyValidator {
    public static void main(String[] args) throws SAXException, IOException {
        File schemaFile = new File("schema.xsd");
        Source xmlFile = new StreamSource("generated.xml");
        SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema schema = schemaFactory.newSchema(schemaFile);
        Validator validator = schema.newValidator();
        validator.validate(xmlFile);
        System.out.println(xmlFile.getSystemId() + " is valid");
    }
}

xsd schema declaring xml only with attributes

For schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.bawi.com/schema" targetNamespace="http://www.bawi.com/schema" elementFormDefault="qualified">
  <xs:element name="MyRootElement">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="MyElement">
          <xs:complexType>
            <xs:attribute name="myAttribute"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Screen Shot 2017-07-27 at 22.06.15
we have valid xml:

<?xml version="1.0" encoding="UTF-8"?>
<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <MyElement myAttribute="anyType"/>
</MyRootElement>

but when we add text body (e.g. ‘aaa’) to the xml:

<?xml version="1.0" encoding="UTF-8"?>
<MyRootElement xmlns="http://www.bawi.com/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bawi.com/schema schema.xsd">
  <MyElement myAttribute="anyType">aaa</MyElement>
</MyRootElement>

then we get validation error:

lineNumber: 3; columnNumber: 51; cvc-complex-type.2.1: Element 'MyElement' must have no character or element information item [children], because the type's content type is empty.

In order to fix it we need to add simepleContent and extension to the schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.bawi.com/schema" targetNamespace="http://www.bawi.com/schema" elementFormDefault="qualified">
  <xs:element name="MyRootElement">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="MyElement">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string">
                <xs:attribute name="myAttribute"/>
              </xs:extension>
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Then both xml above with be valid.
Note text-like information in the MyElement icon:
Screen Shot 2017-07-27 at 22.19.43

Empty elements such as

<MyElement/>

is allowed when schema defines it as

<xs:element name="MyElement"/>

or

<xs:element name="MyElement" type="xs:string"/>

. However, when we change schema to int type such as

<xs:element name="MyElement" type="xs:int"/>

then we get validation error:

lineNumber: 3; columnNumber: 15; cvc-datatype-valid.1.2.1: '' is not a valid value for 'integer'.