Author: Bartosz Wieczorek

Debugging local Tomcat web applications

Project structure:

./context.xml
./pom.xml
./src
./src/main
./src/main/java
./src/main/java/com
./src/main/java/com/bawi
./src/main/java/com/bawi/ConnectionUtils.java
./src/main/webapp
./src/main/webapp/index.jsp
./src/main/webapp/WEB-INF
./src/main/webapp/WEB-INF/web.xml

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.bawi</groupId>
    <artifactId>my-webapp-template</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>war</packaging>

    <build>
        <finalName>my-webapp-template</finalName>
        <plugins>
            <plugin>
                <groupId>org.apache.tomcat.maven</groupId>
                <artifactId>tomcat7-maven-plugin</artifactId>
                <version>2.2</version>
                <configuration>
                    <path>/${project.build.finalName}</path>
                    <contextFile>context.xml</contextFile>
                    <username>admin</username>
                    <password>admin</password>
                </configuration>
                <dependencies>
                    <dependency>
                        <groupId>org.hsqldb</groupId>
                        <artifactId>hsqldb</artifactId>
                        <version>2.4.0</version>
                    </dependency>
                </dependencies>
            </plugin>
        </plugins>
    </build>

</project>

1. First option: standalone tomcat (can be used with tomcat 7 and 8)

In this option update tomcat bin/startup.sh to:

export JPDA_ADDRESS=8000
export JPDA_TRANSPORT=dt_socket

#exec "$PRGDIR"/"$EXECUTABLE" start "$@"
exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"

and add drivers jars to standalone tomcat lib folder if needed, and update context.xml (if non-standard) in standalone tomcat/conf folder,  restart tomcat.

Then build my-webapp-template.war file by ‘mvn clean package’ and manually copy (redeploy) to standalone tomcat webapps folder

or even better redeploy war via

mvn clean tomcat7:redeploy

For that add username/password or reference server in tomcat7-maven-plugin to access standalone tomcat manager-script. Update tomcat-users.xml:

    <role rolename="manager-gui"/>
    <role rolename="manager-script"/>
    <user username="admin" password="admin" roles="manager-gui,manager-script"/>

For debugging in Intellij: Run tab -> Attach to local process and set breakpoint in java code

2. Second option: Embedded Tomcat:

In this option tomcat7-maven-plugin requires adding driver dependencies as plugin dependency and specify context.xml if non-standard.

Intellij: Maven Projects tab, Plugins: right click on

tomcat7:run

plugin to choose debug and set breakpoint. In this option if the source java classes change then you can use Intellij -> Run -> Reload changed classes.

Advertisements

Apache tomcat and hsql db, derby db and postgres db integration

HSQL DB:

Download latest hsqldb zip, extract it, start hdqldb and create automatically database:

me@MacBook:~/dev/env/hsqldb-2.4.0/hsqldb$ java -cp lib/hsqldb.jar org.hsqldb.server.Server --database.0 file:mydatabases/mydb --dbname.0 mydb

Connect to db from console (empty password, just hit enter):

me@MacBook:~/dev/env/hsqldb-2.4.0/hsqldb$ java -jar lib/sqltool.jar --inlineRc=url=jdbc:hsqldb:hsql://localhost/mydb,user=sa
 Enter password for sa:
 SqlTool v. 5736.
 JDBC Connection established to a HSQL Database Engine v. 2.4.0 database
 as "SA" with R/W TRANSACTION_READ_COMMITTED Isolation.

sql>
 CREATE SCHEMA MY_SCHEMA;
 CREATE TABLE MY_SCHEMA.MY_TABLE (email VARCHAR(50));
 INSERT INTO MY_SCHEMA.MY_TABLE VALUES ('abc@efg.hi');
 COMMIT;

sql> SELECT * FROM MY_SCHEMA.MY_TABLE;
 email
 ----------------------------------
 abc@efg.hi

Tomcat:
copy hsqldb/lib/hsqldb.jar to tomcat lib/

DERBY DB:

download from https://db.apache.org/derby/derby_downloads.html, extract and start it as standalone:

me@MacBook:~/dev/env$ tar xzf db-derby-10.14.1.0-bin.tar.gz

me@MacBook:~/dev/env/db-derby-10.14.1.0-bin$ java -jar lib/derbyrun.jar server start
Tue Apr 24 10:02:35 CEST 2018 : Security manager installed using the Basic server security policy.
Tue Apr 24 10:02:35 CEST 2018 : Apache Derby Network Server - 10.14.1.0 - (1808820) started and ready to accept connections on port 1527

me@MacBook:~/dev/env/db-derby-10.14.1.0-bin$ java -cp lib/derbytools.jar:lib/derby.jar:lib/derbyclient.jar org.apache.derby.tools.ij 
ij version 10.14
ij> connect 'jdbc:derby://localhost:1527/mydb;create=true';
ij> CREATE SCHEMA MY_SCHEMA;
0 rows inserted/updated/deleted
ij> CREATE TABLE MY_SCHEMA.MY_TABLE (email VARCHAR(50));
0 rows inserted/updated/deleted
ij> INSERT INTO MY_SCHEMA.MY_TABLE VALUES ('abc@efg.hi');
1 row inserted/updated/deleted
ij> SELECT * FROM MY_SCHEMA.MY_TABLE;
EMAIL 
--------------------------------------------------
abc@efg.hi 

1 row selected
ij> exit;

me@MacBook:~/dev/env/db-derby-10.14.1.0-bin$ ls mydb/
README_DO_NOT_TOUCH_FILES.txt dbex.lck seg0 tmp
db.lck log service.properties

# I have stopped the server and started again #

me@MacBook:~/dev/env/db-derby-10.14.1.0-bin$ java -cp lib/derbytools.jar:lib/derby.jar:lib/derbyclient.jar org.apache.derby.tools.ij 
ij version 10.14
ij> connect 'jdbc:derby://localhost:1527/mydb';
ij> SELECT * FROM MY_SCHEMA.MY_TABLE; 
EMAIL 
--------------------------------------------------
abc@efg.hi  

Note: CREATE TABLE statement had lowercase email column name but derby converted that to uppercase EMAIL.

Tomcat:
copy db-derby-10.14.1.0-bin/lib/derbyclient.jar to tomcat lib/

POSTGRES:

me@ubuntu-vm:~$ sudo apt-get install postgresql
me@ubuntu-vm:~$ sudo -u postgres createuser --interactive
Enter name of role to add: me
Shall the new role be a superuser? (y/n) y
me@ubuntu-vm:~$ sudo -u postgres createdb me
ME@ubuntu-vm:~$ psql
psql (9.5.12)
Type "help" for help.

me=# \password
Enter new password: 
Enter it again:

me=# CREATE DATABASE mydb;
CREATE DATABASE

sg0212148@ubuntu-vm:~$ psql -d mydb

mydb=# \conninfo
You are connected to database "mydb" as user "me" via socket in "/var/run/postgresql" at port "5432".

mydb=# CREATE SCHEMA MY_SCHEMA;
CREATE SCHEMA
mydb=# CREATE TABLE MY_SCHEMA.MY_TABLE (email VARCHAR(50));
CREATE TABLE
mydb=# INSERT INTO MY_SCHEMA.MY_TABLE VALUES ('abc@efg.hi');
INSERT 0 1
mydb=# SELECT * FROM MY_SCHEMA.MY_TABLE;
 email 
------------
 abc@efg.hi
(1 row)

Allow listen on all interfaces:
me@ubuntu-vm:~$ sudo vim /etc/postgresql/9.5/main/postgresql.conf 
#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------
# - Connection Settings -
listen_addresses = '*'
#listen_addresses = 'localhost' # what IP address(es) to listen on;

Allow connections from all networks:
me@ubuntu-vm:~$ sudo vim /etc/postgresql/9.5/main/pg_hba.conf
# IPv4 local connections:
#host all all 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

me@ubuntu-vm:~$ sudo /etc/init.d/postgresql restart
[ ok ] Restarting postgresql (via systemctl): postgresql.service.

VirtualBox settings/Network/Enable Network Adapter: Attached to NAT: click Port Forwarding: add Name: Postgresql, protocol: TCP, Host IP
(empty), Host Port: 5432, Guest IP: (empty), Guest Port: 5432

Tomcat:
copy postgresql-42.2.2.jar to tomcat lib/

src/main/webapp/WEB-INF/web.xml and tomcat/conf/context.xml (last section)

<!DOCTYPE web-app PUBLIC
 "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
 "http://java.sun.com/dtd/web-app_2_3.dtd" >

<web-app>
    <display-name>Archetype Created Web Application</display-name>

    <resource-ref>
        <res-ref-name>jdbc/mydb</res-ref-name>
        <res-type>javax.sql.DataSource</res-type>
        <res-auth>Container</res-auth>
    </resource-ref>

</web-app>
<Context>
...
    <Resource name="jdbc/mydb" auth="Container"
        type="javax.sql.DataSource"
        validationQuery="select 1 from INFORMATION_SCHEMA.SYSTEM_USERS" 
        maxActive="20" maxIdle="30" maxWait="10000"
        username="SA"
        driverClassName="org.hsqldb.jdbc.JDBCDriver"
        url="jdbc:hsqldb:hsql://localhost/mydb"/>
    <!-- 
    <Resource name="jdbc/mydb" auth="Container"
        type="javax.sql.DataSource"
        validationQuery="select 1 from sysibm.sysdummy1"
        maxActive="20" maxIdle="30" maxWait="10000"
        driverClassName="org.apache.derby.jdbc.ClientDriver"
        url="jdbc:derby://localhost:1527/mydb"/>
    -->
    <!-- 
    <Resource name="jdbc/mydb" scope="Container"
        type="javax.sql.DataSource"
        validationQuery="select 1"
        maxActive="20" maxIdle="30" maxWait="10000"
        username="me" password="abc123"
        driverClassName="org.postgresql.Driver"
        url="jdbc:postgresql://localhost:5432/mydb"/>
    -->
</Context>

src/main/webapp/index.jsp:

<%@ page import="com.bawi.ConnectionUtils" %>
<html>
    <body>
        <h2>Hi: <%= ConnectionUtils.getElements() %></h2>
    </body>
</html>

src/main/java/com/bawi/ConnectionUtils.java:

 

package com.bawi;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.sql.DataSource;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;

public class ConnectionUtils {
    public static List<String> getElements() {
        List<String> list = new ArrayList<String>();
        try {
            Context ctx = (Context) new InitialContext().lookup("java:comp/env");
            DataSource ds = (DataSource) ctx.lookup("jdbc/mydb");
            Connection conn = ds.getConnection();
            Statement stmt = conn.createStatement();
            ResultSet res = stmt.executeQuery("SELECT EMAIL from MY_SCHEMA.MY_TABLE");
            while (res.next()) {
                String title = res.getString("EMAIL");
                list.add(title);
            }
            conn.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return list;
    }
}

Ouput:

Hi: [abc@efg.hi]

 

 

Spark RDD vs DataFrame vs DataSet

$ cat a.json

{“age”:19,”name”:”Alice”}

{“age”:20,”name”:”Bob”}

scala> spark.read.json("/Users/me/a.json")
res1: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

scala> res1.printSchema
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)

scala> res1.show
+---+-----+
|age| name|
+---+-----+
| 19|Alice|
| 20| Bob|
+---+-----+

scala> res1.filter("age > 19").show
+---+----+
|age|name|
+---+----+
| 20| Bob|
+---+----+

scala> case class Person(age: BigInt, name: String)
defined class Person

scala> res1.as[Person]
res7: org.apache.spark.sql.Dataset[Person] = [age: bigint, name: string]

scala> res7.filter(_.age > 19)
res8: org.apache.spark.sql.Dataset[Person] = [age: bigint, name: string]

scala> res8.show
+---+----+
|age|name|
+---+----+
| 20| Bob|
+---+----+

 

Spark cache

package com.bawi.spark.scala

import org.apache.spark.sql.SparkSession

object MySpark2Cache {

  def main(args: Array[String]): Unit = {
    val master = if (args.length == 1) args(0) else "local[*]"
    val sparkSession = SparkSession.builder().appName("MySpark2Cache").master(master).getOrCreate()
    val sc = sparkSession.sparkContext

    val array = 1.to(10).toArray
    val rdd = sc.parallelize(array)
    val mapped = rdd.map(n => { println(s"Mapping n=$n"); n })
    val cached = mapped.cache

    val even = cached.filter(_ % 2 == 0).map(n => { Thread.sleep(1000); n }).collect()
    val odd = cached.filter(_ % 2 != 0).map(n => { Thread.sleep(1000); n }).collect()

    println(s"even=${even.mkString(",")}")
    println(s"odd=${odd.mkString(",")}")
  }
}

output:

Mapping n=9
Mapping n=2
Mapping n=6
Mapping n=4
Mapping n=8
Mapping n=5
Mapping n=10
Mapping n=1
Mapping n=3
Mapping n=7
even=2,4,6,8,10
odd=1,3,5,7,9

when we remove cache

val cached = mapped.cache

then we get:

Mapping n=2
Mapping n=6
Mapping n=9
Mapping n=7
Mapping n=4
Mapping n=3
Mapping n=10
Mapping n=8
Mapping n=1
Mapping n=5
Mapping n=4
Mapping n=5
Mapping n=1
Mapping n=2
Mapping n=3
Mapping n=7
Mapping n=9
Mapping n=6
Mapping n=8
Mapping n=10
even=2,4,6,8,10
odd=1,3,5,7,9

So the map method is called twice (for each terminal operation)

Grafana alerting

Scenario:

Alerting situation when value < 10.0

Sent data points: 11.0 , 10.5 , 9.5 ,  9.0 , 11.0

Alert evaluation triggered every: 60s

Time range used in evaluation: 1m

If no data present in the time range evaluated: keep current state

 

 

Screen Shot 2018-03-27 at 13.05.09

Screen Shot 2018-03-27 at 13.11.33

At 11:30:45 first data point (value ok) arrives.

At 11:31:21 alert evaluation triggers – note green vertical line (around 40s after first data point) which means that Grafana entered green state OK and OK notification was sent.

11:32:11 second data point (value ok) arrives but notification is not sent since Grafana already in OK state

11:34:55 third data point arrives which is below expected threshold. Note there was 2min 44sec of no data period but no notification was generated since Grafana is configured to keep last state for no data

11:35:21 alert evaluation triggers and detect value below threshold, alert notification is send and Grafana enters alerting state

11:37:15 fourth data point is sent (value below threshold) but no alert notification is sent in Grafana was already in alerting state

11:38:38 fifth data point is sent with ok value

11:40:21 alert generation triggers, Grafana sets state to ok and ok notification is sent

 

Publish system and custom apache spark metrics to Grafana via spark GraphiteSink and InfluxDB and Graphite Enpoint on Ubuntu

Goal: see apache spark application and our custom spark metrics in Grafana

Screen Shot 2018-03-16 at 15.37.57
1) add dependency spark project pom.xml:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>spark-metrics</artifactId>
    <version>2.0.0</version>
</dependency>

2) Initialize and add gauge for UserMetricsSystem:

import org.apache.spark.groupon.metrics.{SparkGauge, UserMetricsSystem}
import org.apache.spark.sql.SparkSession

object MySparkApp {

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder()
      .appName("MySparkApp")
      .master("local[*]")
      .getOrCreate()

    UserMetricsSystem.initialize(sparkSession.sparkContext, "custom_metrics")

    println(sparkSession.sparkContext.getConf.getAll.deep.mkString)

    val myGauge: SparkGauge = UserMetricsSystem.gauge("MyGauge")
    myGauge.set(123)

    sparkSession.stop()
  }

}

3. create metrics.properties in src/main/resource with:

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=localhost
*.sink.graphite.port=2003
*.sink.graphite.period=5
*.sink.graphite.unit=seconds
*.sink.graphite.prefix=mysparkappprefix

# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

4. Install and configure InfluxDB:

wget https://dl.influxdata.com/influxdb/releases/influxdb_1.2.4_amd64.deb
 sudo dpkg -i influxdb_1.2.4_amd64.deb
 sudo systemctl enable influxdb
 sudo systemctl start influxdb
 systemctl status influxdb
 sudo vim /etc/influxdb/influxdb.conf

to:
– enable admin UI:

[admin]
 # Determines whether the admin service is enabled.
 enabled = true

– enable graphite endpoint and change database name to graphite_metrics:

 [[graphite]]
 # Determines whether the graphite endpoint is enabled.
 enabled = true
 database = "graphite_metrics"

– restart influxdb:

 sudo systemctl restart influxdb
 sudo systemctl status influxdb

–  view influxdb logs

sudo journalctl -u influxdb.service

Note: if you are running influxdb in VirtualBox and you want to expose graphite endpoint to your host operating system e.g. Mac or Windows the add port forwarding to your network nat settings and expose port 2003 in guest host field (ip addresses can be left empty)

5. Run main method in apache spark application and watch spark system metrics and our custom MyGauge metric in spark web UI: http://localhost:4040/metrics/json (spark needs to be running so temporarily add Thread.sleep(60000) to give you time to view http metrics endpoint)

{
 "version": "3.0.0",
 "gauges": {
 "local-1521193939403.driver.BlockManager.disk.diskSpaceUsed_MB": {
 "value": 0
 },
 ...
 "local-1521193939403.driver.DAGScheduler.job.activeJobs": {
 "value": 0
 },
 ...
 "local-1521193939403.driver.MySparkApp.custom_metrics.MyGauge": {
 "value": 123
 },
 ...
 "local-1521193939403.driver.jvm.PS-MarkSweep.count": {
 "value": 2
 },
 ...

6. Connect to influxdb console and view received measuments (listed subset):

$ influx
 Connected to http://localhost:8086 version 1.2.4
 InfluxDB shell version: 1.2.4
 > use graphite_metrics
 Using database graphite_metrics
 > show measurements
 name: measurements
 name
 ----
 mysparkappprefix.local-1521193939403.driver.BlockManager.disk.diskSpaceUsed_MB
 mysparkappprefix.local-1521193939403.driver.BlockManager.memory.maxMem_MB
 mysparkappprefix.local-1521193939403.driver.CodeGenerator.compilationTime.count
 mysparkappprefix.local-1521193939403.driver.DAGScheduler.job.activeJobs

 mysparkappprefix.local-1521193939403.driver.MySparkApp.custom_metrics.MyGauge

 mysparkappprefix.local-1521193939403.driver.jvm.PS-MarkSweep.count
 mysparkappprefix.local-1521193939403.driver.jvm.heap.used
 mysparkappprefix.local-1521193939403.driver.jvm.non-heap.committed
 mysparkappprefix.local-1521193939403.driver.jvm.non-heap.used
 mysparkappprefix.local-1521193939403.driver.jvm.pools.Code-Cache.committed
 mysparkappprefix.local-1521193939403.driver.jvm.total.used
 ...

7. Install and configure Grafana:

wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_4.3.1_amd64.deb
sudo apt-get install -y adduser libfontconfig
sudo dpkg -i grafana_4.3.1_amd64.deb
sudo systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server

– login to Grafana UI as admin/admin at http://localhost:3000/login

Note: you want forward 3000 grafana UI port to virtual box to be viewed in your host machine

– add new datasource: name and type: Graphite, http url default: http://localhost:8086, access: proxy, InfluxDB database: graphite_metrics, hit save and test
– add new dashboard, type graph, click ‘Panel Title’ above the graph to edit the graph,
— in the Metrics tab: select your datasource, click select measurement (in FROM clause) to see the list of metrics, choose any of them to added to the graph, optionally add new name in alias by
— in the Display tab, draw mode: check points
– save the dashboard

8. Note for each spark execution we get a long list of metrics in the drop down. We can aggregate that list by using templates in influxdb config:

sudo vim /etc/influxdb/influxdb.conf:
 [[graphite]]
 ...
 templates = [
 "*.*.*.jvm.*.* application.app_id.executor_id.measurement.mem_type.qty name=jvm",
 "*.*.*.jvm.pools.*.* application.app_id.executor_id.measurement.measurement.mem_type.qty name=jvm_pools",
 "*.*.*.BlockManager.*.* application.app_id.executor_id.measurement.type.qty name=BlockManager",
 "*.*.*.DAGScheduler.*.* application.app_id.executor_id.measurement.type.qty name=DAGScheduler",
 "*.*.*.CodeGenerator.*.* application.app_id.driver.measurement.type.qty name=CodeGenerator",
 "*.*.*.HiveExternalCatalog.*.* application.app_id.driver.measurement.type.qty name=HiveExternalCatalog",
 "*.*.*.*.custom_metrics.* measurement.app_id.driver.application.measurement.name"
 ]

and restart influxdb:

 sudo systemctl restart influxdb
 sudo systemctl status influxdb

9. To see only new results clean measurements from influxdb database:

$ influx
 > use graphite_metrics
 > delete where time > '2016-01-01'
 > show measurementsselect

10. Run the spark app again and view aggregated measurements again:

> show measurements
 name: measurements
 name
 ----
 BlockManager
 CodeGenerator
 DAGScheduler
 HiveExternalCatalog
 jvm
 jvm.pools
 mysparkappprefix.custom_metrics

11. Select individual metric:

> select * from "mysparkappprefix.custom_metrics"
 name: mysparkappprefix.custom_metrics
 time app_id application driver name value
 ---- ------ ----------- ------ ---- -----
 1521198987000000000 local-1521198986819 MySparkApp driver MyGauge 123

> delete where time > '2016-01-01'
 > SELECT value FROM graphite_metrics.autogen."mysparkappprefix.custom_metrics" WHERE "name" = 'MyGauge' AND application = 'MySparkApp' AND driver = 'driver'
 name: mysparkappprefix.custom_metrics
 time value
 ---- -----
 1521200033000000000 123

graphana query:
 > SELECT mean(value) FROM graphite_metrics.autogen."mysparkappprefix.custom_metrics" WHERE "name" = 'MyGauge' AND application = 'MySparkApp' AND driver = 'driver' AND time > now() - 5m GROUP BY time(200ms)
 name: mysparkappprefix.custom_metrics
 time mean
 ---- ----
 1521200012800000000
 1521200013000000000
 ...
 1521200033000000000 123
 1521200033200000000
 ...

http influxdb query:
http://localhost:8086/query?q=select+*+from+%22graphite_metrics%22..%22mysparkappprefix.custom_metrics%22+where+time+%3E+%272018-03-15T15%3A00%3A18Z%27

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "mysparkappprefix.custom_metrics",
          "columns": [
            "time",
            "app_id",
            "application",
            "driver",
            "name",
            "value"
          ],
          "values": [
            [
              "2018-03-16T11:33:53Z",
              "local-1521200033096",
              "MySparkApp",
              "driver",
              "MyGauge",
              123
            ]
          ]
        }
      ]
    }
  ]
}

Screen Shot 2018-03-16 at 15.34.53