Tag: java

Unable to use JStat with Cassandra

We have been having some problems with a Cassandra cluster, so I wanted to look at the java heap space. Unfortunately, jstat cannot find the pid. And, yes, it is the right PID!

Looking in /tmp/hsperfdata_cassandra/, there’s no file! Reading through the whole line where Cassandra is running, I noticed +PerfDisableSharedMem … that’d do it!

It looks like they intentionally set +PerfDisableSharedMem in the Cassandra startup script. I assume their rational is still reasonable … so wouldn’t remove the parameter for day-to-day operation. But, when there’s a problem … restarting Cassandra without this parameter allows us to check how garbage collection is going.

 

Java Heap Stats with JStat

While there are plenty of third-party utilities for looking at the java heap space, I just use jstat (in OpenJDK, this means installing java-<Version>-openjdk-devel

JStat will display the following columns:

--------------------------------------------------------------------------------
S0C: Survivor space 0 size in K
S1C: Survivor space 1 size in K

S0U: Survivor space 0 usage in K
S1U: Survivor space 1 usage in K

--------------------------------------------------------------------------------

EC: Eden space size in K
EU: Eden space usage in K

--------------------------------------------------------------------------------

OC: Old space size in K
OU: Old space usage in K

--------------------------------------------------------------------------------

MC: Meta space size in K
MU: Meta space usage in K

--------------------------------------------------------------------------------

CCSC: CodeCache size in K
CCSU: CodeCache usage in K

--------------------------------------------------------------------------------

YGC: Young generation garbage collection count
YGCT: Young generation garbage collection total time in seconds

FGC: Full garbage collection count
FGCT: Full garbage collection total time in seconds

CGC: Concurrent garbage collection count
CGCT: Concurrent garbage collection time in seconds
GCT: Total garbage collection time in seconds

--------------------------------------------------------------------------------

https://stackoverflow.com/questions/13660871/jvm-garbage-collection-in-young-generation/13661014#13661014 does a good job of explaining the nomenclature & how stuff gets moved around in the heap space

Sample output — this command is for java PID 19356 and will list 100 lines 2 seconds apart (2000 ms)

server01:bin # jstat -gc 19356 2000 100
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 0.0 64207.5 545344.0 319007.2 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 386674.5 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 457055.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 485538.8 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 505893.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815

And this is a time where a third-party tool would be helpful but I never really ‘get’ what is and what is not OK to install on servers, so try not to install things — because the *useful* bit of information for any of this is really the usage / size percent utilization value.

That last grouping of stuff — I look at those v/s how long the pid has been running. If you’ve gotten a billion GC’s and the PID has only been running for eight seconds, that is a crazy amount of I/O. If I’ve only had 3 GCs and the pid has been running for seven years, it hasn’t been doing anything. In between? I don’t really find the numbers useful unless I’ve got a baseline from normal operation.

ElastAlert2 SSL with OpenSearch 2.x

This turned out to be one of those situations where I went down a very complicated path for a very simple problem. We were setting up ElastAlert2 in our OpenSearch sandbox. I’ve used both the elasticsearch-py and opensearch-py modules with Python 3 to communicate with the cluster, so I didn’t anticipate any problems.

Which, of course, meant we had problems. A very cryptic message:

javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size

A quick perusal of the archive of all IT knowledge (aka Google) led me to a Java bug: https://bugs.openjdk.org/browse/JDK-8221218 which may or may not be resolved in the latest OpenJDK (which we are using). I say may or may not because it’s marked as resolved in some places but people report experiencing the bug after resolution was reported.

Fortunately, the OpenSearch server reported something more useful:

[2022-09-20T12:18:55,869][WARN ][o.o.h.AbstractHttpServerTransport] [UOS-OpenSearch.example.net] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/10.1.2.3:9200, remoteAddress=/10.1.2.4:55494}
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 504f5354202f656c617374616c6572745f7374617475735f6572726f722f5f646f6320485454502f312e310d0a486f73743a20

Which I’ve shortened because it was several thousand fairly random seeming characters. Except they aren’t random — that’s the communication hex encoded. Throwing the string into a hex decoder, I see the HTTP POST request.

Which … struck me as rather odd because it should be SSL encrypted rubbish. Turns out use_ssl was set to False! Evidently attempting to send clear text ‘stuff’ to an encrypted endpoint produces the same error as reported in the Java bug.

Setting use_ssl to true brought us to another adventure — an SSLCertVerificationError. We have set the verify_certs to false — even going so far as to go into util.py and modifying line 354 so the default is False. No luck. But there’s another config in each ElastAlert2 rule — http_post_ignore_ssl_errors — that actually does ignore certificate errors. One the rules were configured with http_post_ignore_ssl_errors, ElastAlert2 was able to communicate with the OpenSearch cluster and watch for triggering events.

ElasticSearch Java Heap Size — Order of Precedence

I was asked to look at a malfunctioning ElasticSearch server this week. It’s a lab sandbox, so not a huge deal … but still something they wanted online and functioning for some proof-of-concept testing. And, as a bonus, they’re willing to let us use their lab servers for our own sandboxing (IPv6 implementation, ELK upgrades). There were a handful of problems (it looks like the whole thing used to be a multi-node cluster but the second cluster node has vanished without any trace, the entire platform was re-IP’d, vm.max_map_count wasn’t set on the Docker server, and the logstash folder had a backup of a pipeline config in /path/to/logstash/pipeline/ … which still loads, so causes a continual stream of port-in-use exceptions). But the biggest problem was that any attempt at actually using the ElasticSearch server resulted in it falling over. Java crashed with an out of memory error because the heap space was exhausted. Now I’ve had really small sandboxes before — my first ES sandbox only had a gig of memory and was quite prone to this crash. But the lab server they’ve set up has 64GB of memory. So allocating a few gigs seemed like a quick solution.

The jvm.options file and jvm.options.d folder weren’t mounted into the container — they were the default files held within the container. Which seemed odd, and I made a mental note that it was something we’d need to either mount in or update again when the container gets updated. But no matter how much heap space I allocated, ES crashed.

I discovered that the Docker deployment set an ENV variable for ES_JAVA_OPTS — something which, per the ElasticSearch documentation, overrides all other JVM options. So no matter what I was putting into the jvm options file, the 256 meg set in the ENV was actually being used.

Luckily it’s not terribly difficult to modify the ENV’s within an existing container. You could, of course, redeploy the container with the new settings (and I’ll do that next time, since I’ve also got to get IPv6 enabled). But I wasn’t planning on making any other changes.

Manually Running a JAR File

The java code I now maintain is normally executed through a k8s cluster — this means just testing a quick change requires running the entire deployment pipeline. Sometimes, though, I really just want to test something quickly. In such instances, you can manually run a jar file using “java -jar my_file.jar” —

Maven Build Certificate Error

Attempting to build some Java code, I got a lot of errors indicating a trusted certificate chain was not available:

Could not transfer artifact 
org.springframework.boot:spring-boot-starter-parent:pom:2.2.0.RELEASE 
from/to repo.spring.io (<redacted>): sun.security.validator.ValidatorException: 
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: 
unable to find valid certification path to requested target

And

[ERROR] Failed to execute goal on project errorhandler: 
Could not resolve dependencies for project com.example.npm:errorhandler:jar:0.0.1-SNAPSHOT: 
The following artifacts could not be resolved: 
org.springframework.boot:spring-boot-starter-data-jpa:jar:2.3.7.BUILD-SNAPSHOT, 
org.springframework.boot:spring-boot:jar:2.3.7.BUILD-SNAPSHOT, 
org.springframework.boot:spring-boot-configuration-processor:jar:2.3.7.BUILD-SNAPSHOT: 
Could not transfer artifact org.springframework.boot:spring-boot-starter-data-jpa:jar:2.3.7.BUILD-20201211.052207-37 
from/to spring-snapshots (https://repo.spring.io/snapshot): 
transfer failed for https://repo.spring.io/snapshot/org/springframework/boot/spring-boot-starter-data-jpa/2.3.7.BUILD-SNAPSHOT/spring-boot-starter-data-jpa-2.3.7.BUILD-20201211.052207-37.jar: 
Certificate for <repo.spring.io> doesn't match any of the subject alternative names: [] -> [Help 1]

Ideally, you could just add whatever cert(s) needed to be trusted into the cacerts file for the Java instance using keytool (.\keytool.exe -import -alias digicert-intermed -cacerts -file c:\tmp\digi-int.cer) however the work computers are locked down such that I am unable to import certs into the Java trust store. The second error makes me think it wouldn’t work anyway — if there’s no matching SAN on the cert, trusting the cert wouldn’t do anything.

Fortunately, there are a few flags you can add to mvn to ignore certificate errors — thus allowing the build to complete without requiring access to the cacerts file. There is, of course, a possibility that the trust failure is because your connection is being redirected maliciously … but I see enough other people getting trust failures for this spring-boot stuff (and visiting the site doesn’t show anything suspect) that I’m happy to bypass the security validation this once and just be done with the build 🙂

mvn package -DskipTests -Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=true -Dmaven.wagon.http.ssl.ignore.validity.dates=true jib:build

Decompiling Jython Class Files

Looks like Jython that is compiled into a class file can be decompiled just like a Java class (I use jd-cmd which is both simple and open source). But … you don’t get back Python. In a disaster recovery scenario, you get back something and could reconstruct your Python code from the Java-looking stuff you get back.

I don’t normally type the entire command — a quick function in your .bashrc gives you a command alias that can be used instead.

Persisting Port Names For OpenHAB

If you only have one device connected to your Linux box, your controller may well always be assigned the same port when the system is rebooted. In the real world, multiple connected devices make this unlikely. We use udev rules to create symlinks used within the OpenHAB configuration. The udev rule essentially uses attributes of the device to identify the real port and creates a statically named symlink to that port on boot. So the first thing you need to identify is something unique about the device. We have several video capture cards and a Z-Wave/ZigBee combo controller attached to our server. If you do not have udevinfo, you can use lsusb to find details about the device. First list the devices, then identify the proper one and use the -d switch with the ????:???? formatted ID number. The -v switch outputs verbose information. Find a unique attribute or set of attributes that create a unique identifier. In this case, the interface name is unique and we can stop there.

[lisa@server ~]#  lsusb
Bus 001 Device 004: ID 0bda:0111 Realtek Semiconductor Corp. RTS5111 Card Reader Controller
Bus 001 Device 003: ID 1058:1230 Western Digital Technologies, Inc. My Book (WDBFJK0030HBK)
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 10c4:8a2a Cygnal Integrated Products, Inc.
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
[lisa@server ~]# lsusb -d 10c4:8a2a -v
Bus 002 Device 002: ID 10c4:8a2a Cygnal Integrated Products, Inc.
Device Descriptor:
 bLength 18
 bDescriptorType 1
 bcdUSB 2.00
 bDeviceClass 0
 bDeviceSubClass 0
 bDeviceProtocol 0
 bMaxPacketSize0 64
 idVendor 0x10c4 Cygnal Integrated Products, Inc.
 idProduct 0x8a2a
 bcdDevice 1.00
 iManufacturer 1 Silicon Labs
 iProduct 2 HubZ Smart Home Controller
 iSerial 5 90F0016B
 bNumConfigurations 1
 Configuration Descriptor:
 bLength 9
 bDescriptorType 2
 wTotalLength 55
 bNumInterfaces 2
 bConfigurationValue 1
 iConfiguration 0
 bmAttributes 0x80
 (Bus Powered)
 MaxPower 100mA
 Interface Descriptor:
 bLength 9
 bDescriptorType 4
 bInterfaceNumber 0
 bAlternateSetting 0
 bNumEndpoints 2
 bInterfaceClass 255 Vendor Specific Class
 bInterfaceSubClass 0
 bInterfaceProtocol 0
 iInterface 3 HubZ Z-Wave Com Port
 Endpoint Descriptor:
 bLength 7
 bDescriptorType 5
 bEndpointAddress 0x81 EP 1 IN
 bmAttributes 2
 Transfer Type Bulk
 Synch Type None
 Usage Type Data
 wMaxPacketSize 0x0040 1x 64 bytes
 bInterval 0
 Endpoint Descriptor:
 bLength 7
 bDescriptorType 5
 bEndpointAddress 0x01 EP 1 OUT
 bmAttributes 2
 Transfer Type Bulk
 Synch Type None
 Usage Type Data
 wMaxPacketSize 0x0040 1x 64 bytes
 bInterval 0
 Interface Descriptor:
 bLength 9
 bDescriptorType 4
 bInterfaceNumber 1
 bAlternateSetting 0
 bNumEndpoints 2
 bInterfaceClass 255 Vendor Specific Class
 bInterfaceSubClass 0
 bInterfaceProtocol 0
 iInterface 4 HubZ ZigBee Com Port
 Endpoint Descriptor:
 bLength 7
 bDescriptorType 5
 bEndpointAddress 0x82 EP 2 IN
 bmAttributes 2
 Transfer Type Bulk
 Synch Type None
 Usage Type Data
 wMaxPacketSize 0x0020 1x 32 bytes
 bInterval 0
 Endpoint Descriptor:
 bLength 7
 bDescriptorType 5
 bEndpointAddress 0x02 EP 2 OUT
 bmAttributes 2
 Transfer Type Bulk
 Synch Type None
 Usage Type Data
 wMaxPacketSize 0x0020 1x 32 bytes
 bInterval 0

We then need to create a file under /etc/udev/rules.d. The file name begins with a number that is used for a load order – I generally number my custom files 99 to avoid interfering with system operations. The bit in “ATTRS” is the attribute name and value to match. The KERNEL section contains the search domain (i.e. look at all of the ttyUSB### devices and find ones where the interface is this). The symlink bit is the name you want to use (more on this later). Set the group and mode to ensure OpenHAB is able to use the symlink.

[lisa@server rules.d]# cat 99-server.rules
KERNEL=="ttyUSB[0-9]*", ATTRS{interface}=="HubZ Z-Wave Com Port", SYMLINK+="ttyUSB-5", GROUP="dialout", MODE="0666"
KERNEL=="ttyUSB[0-9]*", ATTRS{interface}=="HubZ ZigBee Com Port", SYMLINK+="ttyUSB-55", GROUP="dialout", MODE="0666"

The symlink name can be anything – when I created udev rules for our video capture cards, I named them something immediately obvious: video-hauppauge250 is the Hauppauge 250 card. Tried to do the same thing here, naming the ports controller-zigbee; while the symlink appeared and had the expected ownership and permissions … OpenHAB couldn’t use it.

Turns out there’s a nuance to Java RxTx where non-standard port names need to be accommodated in the java options. So I could have added -Dgnu.io.rxtxSerialPorts=/dev/controller-zigbee and -Dgnu.io.rxtxSerialPorts=/dev/controller-zwave to the OpenHAB startup and been OK (in theory), it was far easier to name the symlinks using the Linux standard conventions. Hence I have symlinks named ttyUSBsomethingsomethingsomething.Â