Category: Uncategorized

Install specific version of terraform from brew mac osx

Requirement: install brew 0.12.39, however terraform@0.12 points to newer version: 0.12.30
Steps:
https://github.com/Homebrew/homebrew-core/blob/master/Formula/terraform@0.12.rb -> History -> choose commit with specific version e.g. https://github.com/Homebrew/homebrew-core/commit/13fce7f5504ba838199557bde565b824128e4362#diff-20915addc0e9fc9aa51da2f0748e48706d22db60d4b45b66415a8fd116619f37 –> … (3 dots) -> View file -> Raw -> download https://raw.githubusercontent.com/Homebrew/homebrew-core/13fce7f5504ba838199557bde565b824128e4362/Formula/terraform%400.12.rb as terraform@0.12.rb

# install terraform 0.12.29 from source based on rb file

brew install --build-from-source ~/Downloads/terraform@0.12.rb

# keep specific version and exclude from update

brew pin terraform@0.12

Spark Scala and Java – creating rdd, dataframe and dataset

Spark is written in Scala. Spark Java API are wrappers for Scala API for Java Developers not to use Scala language libraries.

1. Create instance of org.apache.spark.sql.SparkSession (spark) using builder (same in both languages):

SparkSession sparkSession = SparkSession.builder().master("local[*]").getOrCreate();

2. Create org.apache.spark.rdd.RDD / org.apache.spark.api.java.JavaRDD, rdd has a partitioned collections of objects (without a schema)
Using directly sparkContext to parallelize scala.collection.Seq[A]:
Scala:

val stringRDD: RDD[String] = sparkSession.sparkContext.parallelize(Seq("a", "bb", "ccc"))

val rowRDD: RDD[Row] = stringRDD.map(string => RowFactory.create(string))

case class Person(name: String, age: Integer) // defined outside main method

val personRDD: RDD[Person] = stringRDD.map(string => Person(string, new java.util.Random().nextInt(100)))
personRDD.foreach(p => println(p)) 
personRDD.foreach(println) // same as above

output:

Person(ccc,63)
Person(a,50)
Person(bb,93)

Use JavaSparkContext wrapping sparkContext to allow parallelize java.util.List<T>
Java:

JavaRDD<String> stringJavaRDD 
  = new JavaSparkContext(sparkSession.sparkContext()).parallelize(Arrays.asList("a", "bb", "ccc"));

JavaRDD<Row> rowJavaRDD = stringJavaRDD.map(string -> RowFactory.create(string));

public static class Person implements Serializable { // class need to be public static outside main method
    private String name;
    private int age;
    public Person() { } // default construction and setters/getters are required by spark
    public Person(String name, int age) { this.name = name; this.age = age; }
    public void setName(String name) { this.name = name; }
    public void setAge(int age) { this.age = age; }
    public String getName() { return name; }
    public int getAge() { return age; }
    @Override public String toString() { return "Person{name='" + name + "',age='" + age + "'}"; }
}

JavaRDD<Person> personJavaRDD = stringJavaRDD.map(name -> new Person(name, new Random().nextInt(100)));

personJavaRDD.foreach(p -> System.out.println(p)); // no .show() method on rdd
// personJavaRDD.foreach(System.out::println); // java.io.NotSerializableException: java.io.PrintStream

// using scala api in java requires scala imports and lots of boilerplate code:
Iterator<String> iterator = Arrays.asList("a", "b").iterator();
// import scala.collection.JavaConverters;
// import scala.collection.Seq;
// import scala.reflect.ClassManifestFactory;
// import scala.reflect.ClassTag$;
Seq<String> seq = JavaConverters.asScalaIteratorConverter(iterator).asScala().toSeq();
RDD<String> rdd = sparkSession.sparkContext().parallelize(seq, 2, 
                                               ClassTag$.MODULE$.apply(String.class));

// using explicit scala code in java (instead of implicit import)
implicits$ implicits = sparkSession.implicits();  // compiles (but highlighed as error in Intellij, no warnings in Eclipse Scala IDE)
Encoder<String> stringEncoder = implicits.newStringEncoder();
Dataset<Row> rowDataset1 = implicits.localSeqToDatasetHolder(stringSeq, stringEncoder).toDF();

output:

Person{name='ccc',age='59'}
Person{name='a',age='25'}
Person{name='bb',age='57'}

3. Create org.apache.spark.sql.DataFrame and org.apache.spark.sql.Dataset:
Dataframe is not available in Java because Dataframe is a scala alias for Dataset[Row].
Dataset is typed. Dataset typed with Row (DataSet<Row>) is Dataframe.

Scala:

// .toDF/.toDS requires sparkSession.implicits._
import sparkSession.implicits._

// 1. from rdd
val df: DataFrame = rdd.toDF() // defaults to `value` as column name: value: string (nullable = true)

val df2: DataFrame = rdd.toDF("str_name") // explicitly set column name: str_name: string (nullable = true)

val ds: Dataset[String] = rdd.toDS() // defaults to `value` as column name: value: string (nullable = true)

val personDF: DataFrame = personRDD.toDF()
personDF.printSchema() // both Dataset and Dataframe have schema associated
personDF.show()


// 2. from Seq
val df11: DataFrame = Seq("X", "Y", "Z").toDF()

val df12: DataFrame = Seq("X", "Y", "Z").toDF("my_string")

val ds11: Dataset[String] = Seq("X", "Y", "Z").toDS()

val personDf: DataFrame = Seq(Person("Bob", 21), Person("Alice", 20)).toDF()

val personDs: Dataset[Person] = Seq(Person("Bob", 21), Person("Alice", 20)).toDS()

// 3. from sparkSession.createDataframe/createDataset
// a) using RDD[Row] and Schema StructType
val df31: DataFrame = sparkSession.createDataFrame(rowRDD, StructType(Seq(StructField("my_column_name", StringType))))
// dataset can be created from RDD, Seq, util.List of type T and implicit Encoder[T]
val ds31: Dataset[Person] = sparkSession.createDataset(util.Arrays.asList(Person("Bob", 21)))

// 4. from / to Dataset / Dataframe:
val ds_2: Dataset[Person] = personDf.as[Person]
val df_2: DataFrame = personDs.toDF()
root
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)

+----+---+
|name|age|
+----+---+
|   a| 84|
|  bb| 80|
| ccc| 89|
+----+---+

Java:

// 1. from sparkSession createDataframe/createDataset
StructField structField = DataTypes.createStructField("my_name", DataTypes.StringType, true);
StructType structType = DataTypes.createStructType((Collections.singletonList(structField)));
Dataset<Row> dataFrame = sparkSession.createDataFrame(rowJavaRDD, structType);

Dataset<Row> personDataframe = sparkSession.createDataFrame(personJavaRDD, Person.class);
personDataframe.printSchema();
personDataframe.show();

Dataset<String> stringDataset = sparkSession.createDataset(stringJavaRDD.rdd(), Encoders.STRING());

Dataset<Person> personDataset = sparkSession.createDataset(
  Arrays.asList(new Person("Bob", 21), new Person("Alice", 20)), Encoders.bean(Person.class));
personDataset.show();

// 2. from / to Dataframe / Dataset
Dataset<Person> ds_2 = personDataframe.as(Encoders.bean(Person.class));
Dataset<Row> df_2 = personDataset.toDF();

output:

root
 |-- age: integer (nullable = false)
 |-- name: string (nullable = true)
+---+----+
|age|name|
+---+----+
| 19|   a|
| 56|  bb|
| 72| ccc|
+---+----+

+---+-----+
|age| name|
+---+-----+
| 21|  Bob|
| 20|Alice|
+---+-----+

4. Mapping RDDs, Dataframes and Datasets:
Scala:

val rowRDD: RDD[Row] = stringRDD.map(string => RowFactory.create(string))
val rowRDD2: RDD[Row] = stringRDD.map(RowFactory.create(_)) // as above

val intDataset: Dataset[Int] = stringDataset.map(string => string.length)
val intDataset2: Dataset[Int] = stringDataset.map(_.length) // as above

Java:

// Mapping RDD
// java functional interface org.apache.spark.api.java.function.Function
JavaRDD<Row> rowJavaRDD = stringJavaRDD.map(new Function<String, Row>() {  
    @Override
    public Row call(String string) throws Exception {
        return RowFactory.create(string);
    }
});

// lambda
JavaRDD<Row> rowJavaRDD2 = stringJavaRDD.map(RowFactory::create); // as above 

stringJavaRDD.foreach(s -> System.out.println(s));

// java.io.NotSerializableException: java.io.PrintStream
// stringJavaRDD.foreach(System.out::println); 

RDD<Row> rowRDD = rowJavaRDD.rdd();
// requires import of scala.reflect.ClassManifestFactory and scala.runtime.AbstractFunction1
RDD<Row> rowRDD1 = rowRDD.map(new AbstractFunction1<Row, Row>() {
    @Override
    public Row apply(Row row) {
        return row;
    }
}, ClassManifestFactory.fromClass(Row.class));

// Mapping Dataset - requires explicit encoder of mapping output type (in scala it used implicit encoder)
// org.apache.spark.api.java.function.MapFunction (from java package)
dataset.map(new MapFunction<String, Integer>() {
    @Override
    public Integer call(String value) throws Exception {
        return value.length();
    }
}, Encoders.INT());


// lambda usage as MapFunction is functional interface
// need to cast to MapFunction to avoid ambiguous method call
dataset.map((MapFunction<String, Integer>) String::length, Encoders.INT());

// compilation error - scala.Function1 is scala specialized and requires implementing additional abstract method apply$mcVJ$sp(long)1
/*
dataset.map(new Function1<String, Integer>() { 
    @Override
    public Integer apply(String string) {
        return string.length();
    }
}, Encoders.INT());
*/

// compilation error: incompatible types: scala.Function1 is not a functional interface multiple non-overriding abstract methods found in interface scala.Function1
// dataset.map((Function1<String, Integer>) String::length, Encoders.INT()); 

// Ambiguous method call conflicting with map(Function1)
// dataset.map(value -> value.length(), Encoders.INT()) 

youtube-dl

youtube-dl -F https://www.youtube.com/watch?v=hRGj4Z-zoHs

 hRGj4Z-zoHs: Downloading webpage
[info] Available formats for hRGj4Z-zoHs:
format code  extension  resolution note
249          webm       audio only tiny   58k , opus @ 50k (48000Hz), 27.72MiB
250          webm       audio only tiny   78k , opus @ 70k (48000Hz), 35.19MiB
140          m4a        audio only tiny  134k , m4a_dash container, mp4a.40.2@128k (44100Hz), 76.48MiB
251          webm       audio only tiny  149k , opus @160k (48000Hz), 62.13MiB
278          webm       256x144    144p   88k , webm container, vp9, 30fps, video only, 46.61MiB
160          mp4        256x144    144p   90k , avc1.4d400c, 30fps, video only, 24.41MiB
133          mp4        426x240    240p  176k , avc1.4d4015, 30fps, video only, 46.23MiB
242          webm       426x240    240p  189k , vp9, 30fps, video only, 58.39MiB
134          mp4        640x360    360p  318k , avc1.4d401e, 30fps, video only, 82.07MiB
243          webm       640x360    360p  389k , vp9, 30fps, video only, 111.22MiB
135          mp4        854x480    480p  507k , avc1.4d401f, 30fps, video only, 129.35MiB
244          webm       854x480    480p  611k , vp9, 30fps, video only, 173.36MiB
247          webm       1280x720   720p 1074k , vp9, 30fps, video only, 302.90MiB
136          mp4        1280x720   720p 1354k , avc1.4d401f, 30fps, video only, 359.40MiB
18           mp4        640x360    360p  432k , avc1.42001E, 30fps, mp4a.40.2@ 96k (44100Hz), 255.39MiB
22           mp4        1280x720   720p  737k , avc1.64001F, 30fps, mp4a.40.2@192k (44100Hz) (best)
youtube-dl -f 133+251 https://www.youtube.com/watch?v=hRGj4Z-zoHs
url="https://www.youtube.com/watch?v=hRGj4Z-zoHs"
duration=$(youtube-dl https://www.youtube.com/watch?v=hRGj4Z-zoHs --get-duration)
youtube-dl --postprocessor-args "-ss 00:05:13.00 -to $duration" -f 'bestvideo[height<=360]+bestaudio[height<=360]/best[height<=360]' $url
youtube-dl --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=hRGj4Z-zoHs
youtube-dl -f 'bestvideo[height<=360]+bestaudio[height<=360]/best[height<=360]' https://www.youtube.com/watch?v=hRGj4Z-zoHs
ffmpeg -v 5 -y -i B.m4a -acodec libmp3lame -ac 2 -ab 192k  B.mp3

Building avro schema programmatically with SchemaBuilder

1. maven pom.xml dependency:

    <dependencies>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.8.1</version>
        </dependency>
    </dependencies>

2. Java avro SchemaBuilder:

import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;

import java.io.File;
import java.io.IOException;
import java.util.Arrays;

public class MySchemaBuilder {
  public static void main(String[] args) throws IOException {
    Schema schema = SchemaBuilder.record("myRecordName").fields()
      .requiredInt("myRequiredInt")
      //.name("myRequiredInt2").type().intType().noDefault()

      .optionalDouble("myOptionalDouble")
      //.name("myOptionalDouble2").type().optional().doubleType()

      .nullableString("myNullableString", "myNullableStringDefaultValue")
      //.name("myNullableString2").type().nullable().stringType().stringDefault("myNullableStringDefaultValue2")

      .name("myRequiredArrayLongs").type().array().items().longType().noDefault()

      .name("myRequiredSubRecord")
        .type(
          SchemaBuilder.record("myRequiredSubRecordType").fields().requiredFloat("myRequiredFloat").endRecord()
        ).noDefault()

      .name("myOptionalArraySubRecords").type().nullable().array()
        .items(
          SchemaBuilder.record("myOptionalArraySubRecordType").fields().requiredBoolean("myRequiredBoolean").endRecord()
        ).noDefault()
    .endRecord();

    System.out.println(schema);

    File file = new File("my-file.avro");
    GenericData.Record record = new GenericData.Record(schema);

    record.put("myRequiredInt", 1);

    record.put("myOptionalDouble", 2.2d); // this line can be commented since optional long

    record.put("myNullableString", "abc"); // this line can be commented since optional string

    record.put("myRequiredArrayLongs", Arrays.asList(1L, 2L, 3L));  // required otherwise java.lang.NullPointerException: null of array in field myRequiredArrayLongs of myRecordName

    GenericData.Record myRequiredSubRecord = new GenericData.Record(schema.getField("myRequiredSubRecord").schema());
    myRequiredSubRecord.put("myRequiredFloat", 1.0f); // required otherwise null of int in field myRequiredFloat of myRequiredSubRecordType in field myRequiredSubRecord of myRecordName
    record.put("myRequiredSubRecord", myRequiredSubRecord); // required otherwise java.lang.NullPointerException: null of myRequiredSubRecordType in field myRequiredSubRecord of myRecordName

    GenericData.Record myOptionalArraySubRecord = new GenericData.Record(schema.getField("myOptionalArraySubRecords").schema().getTypes().get(0).getElementType());
    myOptionalArraySubRecord.put("myRequiredBoolean", true);
    record.put("myOptionalArraySubRecords", Arrays.asList(myOptionalArraySubRecord, myOptionalArraySubRecord));

    GenericDatumWriter<GenericRecord> genericDatumWriter = new GenericDatumWriter<>(schema);
    DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(genericDatumWriter);
    dataFileWriter.create(schema, file);
    dataFileWriter.append(record);
    dataFileWriter.close();
  }
}

3. Ouput schema

{
  "type": "record",
  "name": "myRecordName",
  "fields": [
    {
      "name": "myRequiredInt",
      "type": "int"
    },
    {
      "name": "myOptionalDouble",
      "type": [
        "null",
        "double"
      ],
      "default": null
    },
    {
      "name": "myNullableString",
      "type": [
        "string",
        "null"
      ],
      "default": "myNullableStringDefaultValue"
    },
    {
      "name": "myRequiredArrayLongs",
      "type": {
        "type": "array",
        "items": "long"
      }
    },
    {
      "name": "myRequiredSubRecord",
      "type": {
        "type": "record",
        "name": "myRequiredSubRecordType",
        "fields": [
          {
            "name": "myRequiredFloat",
            "type": "float"
          }
        ]
      }
    },
    {
      "name": "myOptionalArraySubRecords",
      "type": [
        {
          "type": "array",
          "items": {
            "type": "record",
            "name": "myOptionalArraySubRecordType",
            "fields": [
              {
                "name": "myRequiredBoolean",
                "type": "boolean"
              }
            ]
          }
        },
        "null"
      ]
    }
  ]
}

4. Reading
java -jar ~/Downloads/avro-tools-1.8.1.jar tojson my-file.avro

{
  "myRequiredInt": 1,
  "myOptionalDouble": {
    "double": 2.2
  },
  "myNullableString": {
    "string": "abc"
  },
  "myRequiredArrayLongs": [
    1,
    2,
    3
  ],
  "myRequiredSubRecord": {
    "myRequiredFloat": 1.0
  },
  "myOptionalArraySubRecords": {
    "array": [
      {
        "myRequiredBoolean": true
      },
      {
        "myRequiredBoolean": true
      }
    ]
  }
}

In case we do not populate optional/nullable fields we get:

{
  "myRequiredInt": 1,
  "myOptionalDouble": null,
  "myNullableString": null,
  "myRequiredArrayLongs": [
    1,
    2,
    3
  ],
  "myRequiredSubRecord": {
    "myRequiredFloat": 1.0
  },
  "myOptionalArraySubRecords": null
}

Google Cloud dataflow integration with PubSub

1. Create test topic and send message to that:

gcloud pubsub topics create bartek-test-topic
Created topic [projects/xyz-123/topics/bartek-test-topic].
gcloud pubsub topics publish bartek-test-topic --message="hello from gcloud" \
  --attribute="origin=gcloud-sample,username=gcp"
messageIds:
- '1739874026833347'

2. Create output bucket

PROJECT_NAME=$(gcloud config get-value project)
BUCKET_NAME=$PROJECT_NAME-bartek-pubsub
gsutil mb gs://$BUCKET_NAME

3. Git clone and str

git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
cd java-docs-samples/pubsub/streaming-analytics
mvn compile exec:java \
   -Dexec.mainClass=com.examples.pubsub.streaming.PubSubToGcs \
   -Dexec.cleanupDaemonThreads=false \
   -Dexec.args=" \
     --project=$PROJECT_NAME \
     --inputTopic=projects/$PROJECT_NAME/topics/bartek-test-topic \
     --output=gs://$BUCKET_NAME/samples/output \
     --runner=DataflowRunner \
     --region=us-central1 \
     --usePublicIps=false \
     --windowSize=2"
INFO: 2020-11-18T12:55:59.104Z: Starting 1 workers...
Nov 18, 2020 1:56:00 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2020-11-18T12:56:01.092Z: Pub/Sub resources set up for topic 'projects/xyz-123/topics/bartek-test-topic'.
Nov 18, 2020 1:56:04 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process

Nov 18, 2020 1:56:45 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2020-11-18T12:56:41.647Z: Worker configuration: n1-standard-4 in us-central1-f.
Nov 18, 2020 1:56:56 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2020-11-18T12:56:56.738Z: Workers have started successfully.
Nov 18, 2020 2:38:16 PM org.apache.beam.runners.dataflow.DataflowPipelineJob lambda$waitUntilFinish$0
WARNING: Job is already running in Google Cloud Platform, Ctrl-C will not cancel it.
To cancel the job in the cloud, run:
> gcloud dataflow jobs --project=xyz-123 cancel --region=us-central1 2020-11-18_04_55_47-17323678150575849943
gsutil ls gs://$BUCKET_NAME/samples
gs://sab-dev-dap-common-4288-bartek-pubsub/samples/output-13:36-13:38-0-of-1
<pre>
gsutil cat gs://$BUCKET_NAME/samples/*
hello bartek test topic
gcloud dataflow jobs --project=xyz-123 cancel --region=us-central1 2020-11-18_06_55_04-15309600420981137368
Cancelled job [2020-11-18_06_55_04-15309600420981137368]

https ssl localhost ubuntu apache2 chrome and firefox

sudo apt-get install apache2 libnss3-tools net-tools curl -y
sudo apache2ctl configtest
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
Syntax OK
# Fix apache 2 warning
sudo cp /etc/apache2/apache2.conf /etc/apache2/apache2.conf.orig
sudo bash -c 'echo "ServerName 127.0.0.1"  >> /etc/apache2/apache2.conf'
sudo apache2ctl configtest
Syntax OK
mkdir ca
cd ca

Based on https://stackoverflow.com/questions/7580508/getting-chrome-to-accept-self-signed-localhost-certificate/60516812#60516812

######################
# Become a Certificate Authority
######################

# Generate private key
openssl genrsa -des3 -out myCA.key 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
.........................+++++
.............................................................................................+++++
e is 65537 (0x010001)
Enter pass phrase for myCA.key:
Verifying - Enter pass phrase for myCA.key:
# Generate root certificate
openssl req -x509 -new -nodes -key myCA.key -sha256 -days 825 -out myCA.pem
Enter pass phrase for myCA.key:
Can't load /home/me/.rnd into RNG
140349095199168:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/home/me/.rnd
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:PL
State or Province Name (full name) [Some-State]:Malopolskie
Locality Name (eg, city) []:Krakow
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Certificate Authority Company
Organizational Unit Name (eg, section) []:My Certificate Authority Company Department
Common Name (e.g. server FQDN or YOUR name) []:My CA
Email Address []:myCA@localhost
########################
# Create CA-signed certs
########################

NAME=localhost # Use your own domain name e.g. domain.com

# Generate a private key
openssl genrsa -out $NAME.key 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
...................................+++++
........................+++++
e is 65537 (0x010001)
# Create a certificate-signing request
openssl req -new -key $NAME.key -out $NAME.csr
Can't load /home/me/.rnd into RNG
139631538540992:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/home/me/.rnd
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:PL
State or Province Name (full name) [Some-State]:Malopolskie
Locality Name (eg, city) []:Krakow
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company
Organizational Unit Name (eg, section) []:My Company Department
Common Name (e.g. server FQDN or YOUR name) []:localhost
Email Address []:me@localhost

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
# Create a config file for the extensions
>$NAME.ext cat <<-EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = $NAME # Be sure to include the domain name here because Common Name is not so commonly honoured by itself
DNS.2 = bar.$NAME # Optionally, add additional domains (I've added a subdomain here)
IP.1 = 10.0.2.15 # Optionally, add an IP address (if the connection which you have planned requires it)
EOF

# Create the signed certificate 
openssl x509 -req -in $NAME.csr -CA myCA.pem -CAkey myCA.key -CAcreateserial \
 -out $NAME.crt -days 825 -sha256 -extfile $NAME.ext 
Signature ok
subject=C = PL, ST = Malopolskie, L = Krakow, O = My Company, OU = My Company Department, CN = localhost, emailAddress = me@localhost
Getting CA Private Key
Enter pass phrase for myCA.key:
# Check your work 
openssl verify -CAfile myCA.pem -verify_hostname bar.localhost localhost.crt 
localhost.crt: OK
# Copy private key and certificate 
sudo cp localhost.key /etc/ssl/private/ 
sudo cp localhost.crt /etc/ssl/certs/ 

# Backup config 
sudo cp /etc/apache2/sites-available/default-ssl.conf /etc/apache2/sites-available/default-ssl.conf.orig 

# Create SSL Apache config 
>default-ssl.conf cat <<-EOF
<IfModule mod_ssl.c>
    <VirtualHost _default_:443>
        DocumentRoot /var/www/html

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

        SSLEngine on
        SSLCertificateFile    /etc/ssl/certs/localhost.crt
        SSLCertificateKeyFile /etc/ssl/private/localhost.key
    </VirtualHost>
</IfModule>
EOF

sudo cp default-ssl.conf /etc/apache2/sites-available/default-ssl.conf
sudo systemctl restart apache2
sudo apache2ctl configtest
Syntax OK
telnet localhost 443
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
# Enable port 443 (SSL)
sudo a2enmod ssl
sudo systemctl restart apache2
telnet localhost 443
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

<pre>
* Rebuilt URL to: https://localhost/
* Trying 127.0.0.1…
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* error:1408F10B:SSL routines:ssl3_get_record:wrong version number
* stopped the pause stream!
* Closing connection 0
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
</pre>


# Start firefox
firefox https://localhost

Secure Connection Failed An error occurred during a connection to localhost. SSL received a record that exceeded the maximum permissible length. Error code: SSL_ERROR_RX_RECORD_TOO_LONG
# enable default-ssl conf, so that /etc/apache2/sites-enabled/ will have a symlink to default-ssl.conf -> ../sites-available/default-ssl.conf
sudo a2ensite default-ssl
sudo systemctl restart apache2
curl -vv https://localhost
* Rebuilt URL to: https://localhost/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: unable to get local issuer certificate
* stopped the pause stream!
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
curl -vv --cacert localhost.crt https://localhost | head -n 20

#curl: (60) SSL certificate problem: unable to get local issuer certificate

# Use the myCA.pem certificate (using localhost.crt would create #curl: (60) SSL certificate problem: unable to get local issuer certificate)
curl -vv --cacert myCA.pem https://localhost | head -n 11
* Rebuilt URL to: https://localhost/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: myCA.pem
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [21 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [1125 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=PL; ST=Malopolskie; L=Krakow; O=My Company; OU=My Company Department; CN=localhost; emailAddress=me@localhost
*  start date: Nov  3 10:47:11 2020 GMT
*  expire date: Feb  6 10:47:11 2023 GMT
*  subjectAltName: host "localhost" matched cert's "localhost"
*  issuer: C=PL; ST=Malopolskie; L=Krakow; O=My Certificate Authority Company; OU=My Certificate Authority Company Department; CN=My CA; emailAddress=myCA@localhost
*  SSL certificate verify ok.
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
< HTTP/1.1 200 OK
< Date: Tue, 03 Nov 2020 10:57:17 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Last-Modified: Tue, 03 Nov 2020 10:37:09 GMT
< ETag: "2aa6-5b33171ec54b2"
< Accept-Ranges: bytes
< Content-Length: 10918
< Vary: Accept-Encoding
< Content-Type: text/html
< 
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
100 10918  100 10918    0     0  1332k      0 --:--:-- --:--:-- --:--:-- 1332k
* Connection #0 to host localhost left intact

  
      
    Apache2 Ubuntu Default Page: It works
# Start firefox
firefox https://localhost
Warning: Potential Security Risk Ahead. Firefox detected a potential security threat and did not continue to localhost. If you visit this site, attackers could try to steal information like your passwords, emails, or credit card details.

#STOP firefox via ^C - REQUIRED to create firefox profile for the first time!
^CExiting due to channel error.
# Detect firefox profile in use
profile_in_use=$(cd ~/.mozilla/firefox && ls -lat | grep default | head -n 1 | sed 's/.*[[:space:]]\(.*default.*$\)/\1/g')
echo "profile is use: $profile_in_use"
# Run the Certificate Database Tool
#  add (-A) an existing certificate to a certificate database, specify the nickname (-n) of a certificate or key,
#  specify (-t) the trust attributes to modify in an existing certificate or to apply to a certificate when creating it or adding it to a database. There are three available trust categories for each certificate, expressed in this order: "SSL ,email ,object signing ":
#  c    Valid CA
#  T    Trusted CA to issue client certificates (implies c)
#  C    Trusted CA to issue server certificates (SSL only)(implies c)
certutil -A -n "My CA certificate" -t "TC,," -i myCA.pem -d sql:/$HOME/.mozilla/firefox/$profile_in_use

# List the certificate
certutil -d sql:/$HOME/.mozilla/firefox/$profile_in_use -L

# Delete certificate (do not execute it here)
# certutil -D -n "My CA certificate" -d sql:/$HOME/.mozilla/firefox/$profile_in_use
Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPI
DigiCert SHA2 Secure Server CA                               ,,   
Amazon                                                       ,, 
... 
My CA certificate                                            CT,, 
# Start firefox again
firefox https://localhost

Note 2 certificates (based on localhost.crt and myCA.pem)

# Install and run googlechrome
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt-get update
sudo apt-get install google-chrome-stable -y
google-chrome https://localhost
Your connection is not private. Attackers might be trying to steal your information from localhost (for example, passwords, messages, or credit cards). NET::ERR_CERT_AUTHORITY_INVALID

In Authorites tab import myCA.pem certificate

Now reload the page

It also work for explicitly specified https://10.0.2.15 in Certificate Subject Alternative Name

However https://127.0.0.1 and https://ubuntu-vm is not specified in the Alternative Name

same for firefox:

Kafka console producer and consumer with Kerberos, Sentry and IDM

1. IDM – create user login myuser
2. IDM – create user group kafka_dev_myuser_rw_grp, kafka_dev_myuser_ro_grp and add user login myuser to that group
so

bash-4.2$ id myuser
uid=12345678(myuser) gid=12345678(myuser), 87654321(kafka_dev_myuser_rw_grp), 87654320(kafka_dev_myuser_ro_grp)

3. Hue/Hive sql create role:

0: jdbc:hive2://host> CREATE ROLE kafka_dev_myuser_rw_role; 
0: jdbc:hive2://host> GRANT  ROLE kafka_dev_myuser_rw_role TO GROUP kafka_dev_myuser_rw_grp;
0: jdbc:hive2://host> SHOW ROLE GRANT GROUP kafka_dev_myuser_rw_grp;
+---------------------------+---------------+-------------+----------+--+
|           role            | grant_option  | grant_time  | grantor  |
+---------------------------+---------------+-------------+----------+--+
| kafka_dev_myuser_rw_role  | false         | NULL        | --       |
+---------------------------+---------------+-------------+----------+--+
0: jdbc:hive2://host> CREATE ROLE kafka_dev_myuser_ro_role; 
0: jdbc:hive2://host> GRANT  ROLE kafka_dev_myuser_ro_role TO GROUP kafka_dev_myuser_ro_grp;
0: jdbc:hive2://host> SHOW ROLE GRANT GROUP kafka_dev_myuser_ro_grp;
+---------------------------+---------------+-------------+----------+--+
|           role            | grant_option  | grant_time  | grantor  |
+---------------------------+---------------+-------------+----------+--+
| kafka_dev_myuser_ro_role  | false         | NULL        | --       |
+---------------------------+---------------+-------------+----------+--+
1 row selected (0.106 seconds)

4. Setup topic and producer and consumer group access for roles:

cd /run/cloudera-scm-agent/process/1234-kafka-KAFKA_BROKER/
kinit -kt kafka.keytab kafka/host.hadoop.com@HADOOP.COM
kafka-sentry -lp -r kafka_dev_myuser_rw_role
HOST=*->TOPIC=my-topic->action=write
HOST=*->TOPIC=my-topic->action=describe

or set action to all

kafka-sentry -lp -r kafka_dev_myuser_ro_role
HOST=*->TOPIC=my-topic->action=describe
HOST=*->TOPIC=my-topic->action=read
HOST=*->CONSUMERGROUP=*->action=describe
HOST=*->CONSUMERGROUP=*->action=read

optionally host and consumer group can be restricted to specific values:

HOST=127.0.0.1->TOPIC=my-topic->action=read
HOST=10.0.0.123->CONSUMERGROUP=my-flume-group->action=read

5. Send data using Kafka console producer:
Init kerberos keytab session:
kinit -kt /path/to/me.keytab me@HADOOP.COM
If that fails due to expired password, change the password and re – create / generate kerberos keytab:
ipa-getkeytab -s my-kdc-server.hadoop.com -p me@HADOOP.COM -P -k me.keytab

Create jks file based on crt file:

keytool -importcert -file /path/to/gsnonpublicroot2.crt -keystore /path/to/my.jks 
Enter keystore password:  mypass1234
 
Trust this certificate? [no]:  yes
Certificate was added to keystore

producer.properties:

security.protocol=SASL_SSL
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
ssl.truststore.location=/path/to/my.jks
ssl.truststore.password=mypass1234
export KAFKA_OPTS="-Djava.security.auth.login.config=jaas.conf"
kafka-console-producer --broker-list host1:9093,host2:9093 --topic my-topic --producer.config producer.properties
20/12/14 03:49:05 INFO producer.ProducerConfig: ProducerConfig values: 
...
Debug is  true storeKey true useTicketCache false useKeyTab true doNotPrompt false ticketCache is null isInitiator true KeyTab is /path/to/me.keytab refreshKrb5Config is false principal is me@HADOOP.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
principal is me@HADOOP.COM
Will use keytab
Commit Succeeded 
...
20/12/14 03:49:06 INFO authenticator.AbstractLogin: Successfully logged in.
20/12/14 03:49:06 INFO utils.AppInfoParser: Kafka version : 1.0.1-kafka-3.1.1
20/12/14 03:49:06 INFO utils.AppInfoParser: Kafka commitId : unknown
>hello from bartek
>^

6. Read 2 messages using kafka console consumer:
It does not require ksession to be present: klist: No credentials cache found

consumer.properties:

security.protocol=SASL_SSL
sasl.kerberos.service.name=kafka
sasl.mechanism=GSSAPI
#group.id=my-consumer-group
#auto.offset.reset=earliest

jaas.conf:

KafkaClient {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  debug=true
  storeKey=true
  useTicketCache=false
  renewTicket=true
  keyTab="/path/to/me.keytab"
  principal="me@HADOOP.COM";
};
export KAFKA_OPTS="-Djava.security.auth.login.config=/path/to/jaas.conf"
kafka-console-consumer --bootstrap-server host1:9093,host2:9093 --consumer.config /path/to/consumer.properties --topic my-topic --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property print.value=true --max-messages 2
Debug is  true storeKey true useTicketCache false useKeyTab true doNotPrompt false ticketCache is null isInitiator true KeyTab is /path/to/me.keytab refreshKrb5Config is false principal is me@HADOOP.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
principal is me@HADOOP.COM
Will use keytab
Commit Succeeded
Output data

Intellij debugging local apache flume with netcat source and logger sink

1. Download apache-flume-1.6.0-bin.tar.gz, un-compress and run normally flume using bash start script

me@MacBook:~/Downloads/apache-flume-1.6.0-bin$ bin/flume-ng agent \
  --conf conf --conf-file conf/example.conf --name a1 \
  -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true \
  -Dorg.apache.flume.log.rawdata=true

with netcat source and logger sink

me@MacBook:~/Downloads/apache-flume-1.6.0-bin$ cat conf/example.conf
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c

2. Examine process command via ps -ef or watch the logs:

+ exec /Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true -cp '/Users/me/Downloads/apache-flume-1.6.0-bin/conf:/Users/me/Downloads/apache-flume-1.6.0-bin/lib/*' -Djava.library.path= org.apache.flume.node.Application --conf-file conf/example.conf --name a1

3. Add debugging options -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=5005,suspend=y:

me@MacBook:~/Downloads/apache-flume-1.6.0-bin$ /Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/bin/java \
  -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=5005,suspend=y \
  -Xmx20m -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true \
  -Dorg.apache.flume.log.rawdata=true \
  -cp /Users/me/Downloads/apache-flume-1.6.0-bin/conf:/Users/me/Downloads/apache-flume-1.6.0-bin/lib/* \
  -Djava.library.path= org.apache.flume.node.Application \
  --conf-file conf/example.conf --name a1
Listening for transport dt_socket at address: 5005

or when using kerberized kafka:

/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/bin/java \
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=5005,suspend=y \
-Xmx20m -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true \
-Dorg.apache.flume.log.rawdata=true\
-Dflume.log.dir=/tmp/flume -Dagent.id=localhost \
-Djava.security.auth.login.config=/Users/me/jaas.conf \
-Djava.security.krb5.conf=/Users/me/krb5.conf \
-cp '/Users/me/apache-flume-1.6.0-bin/conf:/Users/me/apache-flume-1.6.0-bin/lib/*' \
org.apache.flume.node.Application --conf-file conf/kafka-dev.conf --name a1

4. Create or use maven project with flume-ng-core dependency:

<dependency>
  <groupId>org.apache.flume</groupId>
  <artifactId>flume-ng-core</artifactId>
  <version>1.6.0</version>
</dependency>

5. In Intellij create a debug configuration:

and click debug to get connected to the process:

6. Set the breakpoint in the process method of org.apache.flume.sink.LoggerSink:

7. Start netcat and send sample text:

me@MacBook:~$ telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
aaaaa
OK

Apache flume

me@MacBook:~/Downloads/apache-flume-1.9.0-bin$ bin/flume-ng agent \
    --conf conf --conf-file conf/example2.conf --name a1 \
    -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true \
    -Dorg.apache.flume.log.rawdata=true
me@MacBook:~/Downloads/apache-flume-1.9.0-bin$ cat conf/example.conf
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1