Category: Technology

Agile Methodology Is Not Anarchy

For the past several years, my employer has been moving toward an Agile development methodology. There are some challenges when mapping this methodology into operations because it’s not the same thing; but, surprisingly, those are not where I have experienced challenges. The biggest challenge during this transition is some of my coworkers seem to think the methodology is that there are no rules.

A friend of mine, a fairly eccentric history professor, used to say that a little knowledge is a dangerous thing but you’ve got to emphasize LITTLE. And it seems like we’re encountering a situation where Phil’s emphasis holds true: the only thing garnered from from Agile training is that the documentation and process from waterfall projects are no more. But breaking away from the large-scale view of a P.R.O.J.E.C.T. for Agile development is a bit like breaking a monolithic application out into microservices — it still needs to do all of the same ‘stuff’, it just does it differently. And there are still policies and procedures — even a microservice team is going to have a coding standard, a process for handling merges, a way of scheduling time off, and some basic idea of what their application needs to accomplish. Sure, the app’s design will change incrementally over time. But it’s not an emergent property like chaos/complexity theory.

Maybe the “what Agile means to me” mentality comes from failing to clearly map a development methodology into an operations framework. Maybe it’s just a good excuse to avoid components of work that they do not enjoy. To avoid “agile operations” becoming “no boring planning stuff!!!”, I’ve outlined ways in which the Scrum methodologies the company wishes to adopt can be used to streamline operations. It helps that our group is reorganizing into an operational/support group and an architecture/design group — I see a lot of places within the operations team where Scrum approaches make sense.

Backlog — prioritizing the ticket queue like a backlog and having support staff constantly pulling from the top of the list — not only is this an awesome way to avoid the guy who scans the queue for the easy jobs, but it ensures the most important problems are being resolved first. A universal set of stakeholders does not exist for the ticket queue — someone whose ticket is ranked fifteenth on the list may disagree, and they are welcome to add details explaining why the issue is more impacting that it seems on its face. But 90% or more of our tickets are “Sev3” — which basically means both “we want it done ASAP” and “it isn’t a wide-spread high impact outage”. Realistically, dozens of tickets do not have the exact same time constraint and impact. There is extra work for management in converting a ticket bucket into an ordered backlog, but the payoff is that tickets are resolved in an order that correlates to the importance of the issue. In addition to the ticket queue, routine maintenance tasks will be included in the backlog. And prioritized accordingly.

Very short sprints — while developers moving from Waterfall to Agile might start from a month (or two) long sprint and trim weeks as they evolve into the process, operations starts from the other end of the spectrum. Our norm is to grab a ticket, sort it, then look at the queue and grab another one. We are planning for hours, maybe a day or two. This means we might establish application access on Tuesday that isn’t needed until next Monday. Establish a sprint that lasts a week, and use the backlog to get tickets that have lower priority (either because the impact is lower or because resolution is not needed for a week) included in the sprint. Service interruptions, SEV1 and SEV2 tickets, will occur and should be assumed in the sprint planning (i.e. either take enough work that you think it will just get done with no service interruption tickets and accept that some tickets from the sprint will be incomplete or leave some space for service interruption tickets and have staff pull “bonus” tickets from the top of the backlog if they have no work toward the end of the sprint).

Estimation — going through the tickets and classifying each incident as a quick little task, something that will take a few hours, or a significant undertaking facilitates in sprint planning. It’s difficult to know how many tickets I can reasonably expect to include in a sprint if I cannot differentiate between a three minute config change and a three day application rollout.

Multi-tasking — Implementation, support, and ticket resolution tasks are no longer a big bucket of work that individuals attempt to multi-task to complete. There are distinct tasks that are completed in series. Some tickets require information from the user; put the ticket on hold until a response is received and move on to the next unit of work.

Velocity — historic data based on time estimates cannot be generated, but simple number of tickets per week pre- and post- can certainly be compared. And going forward, ticket counts can be weighted by estimation values.

Stand-ups are a bit of a mental sticking point for me. I can conceive the value of spending a few minutes reviewing what you’ve done, what you plan on doing, and ensuring there is a ready forum to discuss any sticking points (maybe someone else has encountered a similar situation and can offer assistance). Stand-ups could include a quick discussion of any priority shifts (escalations, service interruptions) too. *But* my experience with stand-ups has been the attendance test variety — stand-ups that were used to hurt individuals who didn’t make it to the office by 08:00. Or those who weren’t around at 16:50. I don’t think it’s reasonable to ask someone who got into an issue and worked until 7P to show up at 8A the next day. I also don’t think it is reasonable to expect someone who came in at 6A to continue working until 5P. Were a stand-up scheduled in the middle of the day, I might feel differently about them.

New Process Police

As an operational support group, we did not have a software development methodology. Doesn’t mean we didn’t develop software — one of the great things within operational support is the ability to automate day-to-day tasks to reduce workload. Why have someone check for application patches when a process can watch an RSS feed or file repository and notify us when there’s an update. Why have someone clickity-click provisioning users into groups when the user can make a web request, the group owner can approve the request, and an automated process can add the user into the group? The end result of our automation programming is, well, quite a bit of software.

And with a small number of people, informal application development worked. Wasn’t ideal, but it worked. If you want to write in Java while I use C# … not ideal, but the alternative is that one of us needs to learn a new programming language. Problem is the next guy we hired uses VBS, the next guy uses PowerShell … and I’ll use perl for simpler processes. Then someone starts tweaking my code and buggers it up … and we’ve got to figure out what happened and roll back based on some tape backup.

To get our internal software development processes organized, I developed a process. And ran a training session so everyone was familiar with both the process and the tools. Some of us have used the process well — don’t edit production code, clone the repo locally, make a branch for your edits, test it and have another group member sign off on the changes, merge your branch back into master, test more, then pull the code into production. The majority, it seems, have not followed the process at all. Changes are made to the code running in production, not incorporated into the Git repo. Six months after the new development process went in place, half of our code has improperly made changes!

To an extent, I consider this a management problem … if the department doesn’t want software development to be a free-for-all, then the department managers need to ensure their staff follows the process. If the department wants everyone to do their own thing — then get rid of the process and declare our methodology as “do whatever you want”. The challenge for managers, though, is that they don’t know that someone has edited code in production and failed to commit their changes into the repo. If only there were some way to watch for improperly edited code and alert us promptly.

Other scripts I’ve found to perform a similar function attempt to parse ‘git status’ to identify all sorts of issues — but that doesn’t address the specific problem that I’ve got. To facilitate identifying offenders, I wrote a quick Python script that searches a directory tree for git repositories and alerts us when changes have not been staged for commit. If you’ve staged the changes for commit, that won’t be identified. But the particular problem we encounter frequently … there are alerts for that.

Search engines as bias algorithms

Sigh! Once again, Trump is the most lamentable victim of any persecution in history (including actual people who were bloody well burned alive as witches). Anyone remember people google-bombing Bush2 to link moron and “miserable failure” over to him? Algorithms can be used against themselves.
 
Singling out a specific feedback cycle that is damaging *to you* is a whole heap of hypocrisy (and thus neo-presidential). But Google’s “personalized” news seems to be just as much a victim of commercialization as cable news. I don’t get many articles that are complimentary of Trump. My husband does. Seeing information that substantiates your point of view is not a good thing for personal growth and awareness, but this is not a conspiracy. Just Google’s profile of us — we’re getting news that Google thinks we’ll like. I know search results are personalized in an attempt to deliver what *you* want … so it’s quite possible my search results will skew toward Trump-negative content (although as a matter of business, I would expect my husband’s to skew toward Trump-positive content). I guess we all know what Trump clicks on and what he doesn’t if *his* results aren’t just a summary of the Fox News homepage.
And thinking about search engines logically — the “secret sauce” is a bias algorithm. Back in 1994, there was an “Internet Directory” — a alphabetized listing (maybe categorized by subject … don’t recall) of web sites. It missed some. More importantly, though, there were not millions of web sites with new ones popping up every day, so I’m thinking the phone book approach to web sites might not work. If search engines crawled the Internet and returned a newest to oldest list of everything that contains the word … or a list alphabetized by author, page title, etc, … would you use a search engine??? Someone would have to write a bias search algorithm to work against search engine results. Oh, wait.

Temporary Fix: ZoneMinder, PHP7.2, openHAB ZoneMinder Binding

I got Zoneminder 1.31.45 (which includes the new CakePHP framework that doesn’t use what have become reserved words in PHP7) working with the openHAB ZoneMinder binding (which relies on data from the API at  /zm/api/configs/view/ATTR_NAME.json). There are two options, ZM_PATH_ZMS and ZM_OPT_FRAME_SERVER which now return bad parameter errors when attempting to retrieve the config using /view/. Looking through the database update scripts, it appears both of these parameters were removed at ZoneMinder 1.31.1

ZM_PATH_ZMS was removed from the Config database and placed in a config file, /etc/zm/conf.d/01-system-paths.conf. There is a PR to “munge” this value into the API so /viewByName returns its value … but that doesn’t expose it through /view.

ZM_OPT_FRAME_SERVER appears to have been eliminated as a configuration option.

You cannot simply re-insert the config options into the database, as ZoneMinder itself loads the ZM_PATH_ZMS value from the config file and then proceeds to use it. When it attempts to load config parameters from the Config table and encounters a duplicate … it falls over. We were unable to view our video through the ZoneMinder server.

*But* editing /usr/share/zoneminder/www/includes/config.php (exact path may vary, list the files from your package install and find the config.php in www/includes) to include an if clause around the section that loads config parameters from the database, and only loading the parameter when the Name is not ZM_PATH_ZMS (bit in yellow below) avoids this overlapping config value.

$result = $dbConn->query( 'select * from Config order by Id asc' );
if ( !$result )
   echo mysql_error();
   $monitors = array();
   while( $row = dbFetchNext( $result ) ) {
      if ( $defineConsts )
      // LJR 2018-08-18 I inserted this config parameter into DB to get OH2-ZM running, and need to ignore it in the ZM web code
      if( strcmp($row['Name'],'ZM_PATH_ZMS') != 0){
         define( $row['Name'], $row['Value'] );
      }
   $config[$row['Name']] = $row;
   if ( !($configCat = &$configCats[$row['Category']]) ) {
      $configCats[$row['Category']] = array();
      $configCat = &$configCats[$row['Category']];
   }
   $configCat[$row['Name']] = $row;
}

Once the ZoneMinder web site happily ignores the presence of ZM_PATH_ZMS from the database config table, you can insert it and ZM_OPT_FRAME_SERVER (an option which appears to have been removed at ZoneMinder 1.31.1) back into the Config table. **Important** — change the actual value of ZM_PATH_ZMS to whatever is appropriate for your installation. In my ZoneMinder installation, /cgi-bin-zm is the cgi-bin directory, and /cgi-bin-zm/nph-zms is the ZMS binary.

From a MySQL command line:

use zm; #Assuming your zoneminder database is actually named zm
INSERT INTO `Config` VALUES (225,'ZM_PATH_ZMS','/cgi-bin-zm/nph-zms','string','/cgi-bin-zm/nph-zms','relative/path/to/somewhere','(?^:^((?:[^/].*)?)/?$)',' $1 ','Web path to zms streaming server',' The ZoneMinder streaming server is required to send streamed images to your browser. It will be installed into the cgi-bin path given at configuration time. This option determines what the web path to the server is rather than the local path on your machine. Ordinarily the streaming server runs in parser-header mode however if you experience problems with streaming you can change this to non-parsed-header (nph) mode by changing \'zms\' to \'nph-zms\'. ','hidden',0,NULL);
INSERT INTO `Config` VALUES (226,'ZM_OPT_FRAME_SERVER','0','boolean','no','yes|no','(?^i:^([yn]))',' ($1 =~ /^y/) ? \"yes\" : \"no\" ','Should analysis farm out the writing of images to disk',' In some circumstances it is possible for a slow disk to take so long writing images to disk that it causes the analysis daemon to fall behind especially during high frame rate events. Setting this option to yes enables a frame server daemon (zmf) which will be sent the images from the analysis daemon and will do the actual writing of images itself freeing up the analysis daemon to get on with other things. Should this transmission fail or other permanent or transient error occur, this function will fall back to the analysis daemon. ','system',0,NULL);

Now restart ZoneMinder and the OH2 ZoneMinder binding. We’ve got monitors on the ZoneMinder web site, we are able to view the video stream, and OH2 picks up alarms from the ZoneMinder server.

If you re-run zmupdate.pl, it will remove these two records from the Config table. If you upgrade ZoneMinder, the change to the PHP file will be reverted.

DevOps Alternatives

While many people involved in the tech industry have a wide range of experience in technologies and are interested in expanding the breadth of that knowledge, they do not have the depth of knowledge that a dedicated Unix support person, a dedicated Oracle DBA, a dedicated SAN engineer person has. How much time can a development team reasonably dedicate to expanding the depth of their developer’s knowledge? Is a developer’s time well spent troubleshooting user issues? That’s something that makes the DevOps methodology a bit confusing to me. Most developers I know … while they may complain (loudly) about unresponsive operational support teams, about poor user support troubleshooting skills … they don’t want to spend half of their day diagnosing server issues and walking users through basic how-to’s.

The DevOps methodology reminds me a lot of GTE Wireline’s desktop and server support structure. Individual verticals had their own desktop support workforce. Groups with their own desktop support engineer didn’t share a desktop support person with 1,500 other employees in the region. Their tickets didn’t sit in a queue whilst the desktop tech sorted issues for three other groups. Their desktop support tech fixed problems for their group of 100 people. This meant problems were generally resolved quickly, and some money was saved in reduced downtime. *But* it wasn’t like downtime avoidance funded the tech’s salary. The business, and the department, decided to spend money for rapid problem resolution. Some groups didn’t want to spend money on dedicated desktop support, and they relied on corporate IT. Hell, the techs employed by individual business units relied on corporate IT for escalation support. I’ve seen server support managed the same way — the call center employed techs to manage the IVR and telephony system. The IVR is malfunctioning, you don’t put a ticket in a queue with the Unix support group and wait. You get the call center technologies support person to look at it NOW. The added advantage of working closely with a specific group is that you got to know how they worked, and could recommend and customize technologies based on their specific needs. An IM platform that allowed supervisors and resource management teams to initiate messages and call center reps to respond to messages. System usage reporting to ensure individuals were online and working during their prescribed times.

Thing is, the “proof” I see offered is how quickly new code can be deployed using DevOps methodologies. Comparing time-to-market for automated builds, testing, and deployment to manual processes in no way substantiates that DevOps is a highly efficient way to go about development. It shows me that automated processes that don’t involve waiting for someone to get around to doing it are quick, efficient, and generally reduce errors. Could similar efficiency be gained by having operation teams adopt automated processes?

Thing is, there was a down-side to having the major accounts technical support team in PA employ a desktop support technician. The major accounts technical support did not have broken computers forty hours a week. But they wanted someone available from … well, I think it was like 6AM to 10PM and they employed a handful of part time techs, but point remains they paid someone to sit around and wait for a computer to break. Their techs tended to be younger people going to school for IT. One sales pitch of the position, beyond on-the-job experience was that you could use your free time to study. Company saw it as an investment – we get a loyal employee with a B.S. in IT who moved into other orgs, college kid gets some resume-building experience, a chance to network with other support teams, and a LOT of study time that the local fast food joint didn’t offer. The access design engineering department hired a desktop tech who knew the Unix-based proprietary graphic workstations they used within the group. She also maintained their access design engineering servers. She was busier than the major accounts support techs, but even with server and desktop support she had technical development time.

Within the IT org, we had desktop support people who were nearly maxed out. By design — otherwise we were paying someone to sit around and do nothing. Pretty much the same methodology that went into staffing the call center — we might only expect two calls overnight, but we’d still employ two people to staff the phones between 10P and 6A *just* so we could say we had a 24×7 tech support line. During the day? We certainly wouldn’t hire two hundred people to handle one hundred’s worth of calls. Wouldn’t operations teams be quicker to turn around requests if they were massively overstaffed?

As a pure reporting change, where you’ve got developers and operations people who just report through the same structure to ensure priorities and goals align … reasonable. Not cost effective, but it’s a valid business decision. In a way, though, DevOps as a vogue ideology is the impetus behind financial decisions (just hire more people) and methodology changes (automate it!) that would likely have similar efficacy if implemented in silo’d verticals.

 

openHAB With Custom Built Serial Binding – fix to locking permission issue

When we updated our openHAB server to Fedora 28 and changed to a non-root user, the openhab user was unable to create lock files in /run/lock. As an interim fix, we just changed the permission on the lock folder to allow the openhab account to create files. As a more elegant solution, I’ve built the nrjavaserial JAR file from the source in NeuronRobotics’ repository.

The process to build and use a JAR built from this source follows. Before attempting to build the nrjavaserial jar from source, ensure you have gradle (which will install a LOT of additional packages), lockdev, lockdev-devel, some jdk, and some jdk-devel (I used java-1.8.0-openjdk-1.8.0.181-7.b13.fc28.x86_64 and java-1.8.0-openjdk-devel-1.8.0.181-7.b13.fc28.x86_64 because they were already installed for other projects).

# Set ossrhUsername and ossrhPassword values for the account used to build the project – username and password can be null
[lisa@server ~]# cat ~/.gradle/gradle.properties
ossrhUsername=
ossrhPassword=

# Grab the source
[lisa@server ~]# git clone https://github.com/NeuronRobotics/nrjavaserial.git

# Build the project
[lisa@server ~]# cd nrjavaserial
[lisa@server nrjavaserial]# make linux64 # assuming you’ve got 64-bit linux

# Voila, a jar file
[lisa@server nrjavaserial]# cd build/libs
[lisa@server libs]# ll
total 852
-rw-r–r– 1 root root 611694 Aug 16 10:08 nrjavaserial-3.14.0.jar
-rw-r–r– 1 root root 170546 Aug 16 10:08 nrjavaserial-3.14.0-javadoc.jar
-rw-r–r– 1 root root 85833 Aug 16 10:08 nrjavaserial-3.14.0-sources.jar

Before installing the newly built nrjavaserial-3.14.0.jar into openHAB, ensure you have lockdev installed on your Fedora machine and add your openhab user account to the lock group.

# Verify the lockdev folder was created
[lisa@server ~]# ll /run/lock/
total 4
-rw-r–r– 1 root root 22 Aug 10 15:35 asound.state.lock
drwx—— 2 root root 60 Aug 10 15:30 iscsi
drwxrwxr-x 2 root lock 140 Aug 16 12:19 lockdev
drwx—— 2 root root 40 Aug 10 15:30 lvm
drwxr-xr-x 2 root root 40 Aug 10 15:30 ppp
drwxr-xr-x 2 root root 40 Aug 10 15:30 subsys
# Add the openhab user to the lock group
[lisa@server ~]# usermod -a -G lock openhab

The openhab user account can now write to the /run/lock/lockdev folder. Install the new jar file into openHAB. When you restart openHAB, verify lock files are created as expected.
[lisa@server ~]# ll /run/lock/lockdev/
total 20
-rw-rw-r– 5 openhab openhab 11 Aug 16 12:19 LCK…31525
-rw-rw-r– 5 openhab openhab 11 Aug 16 12:19 LCK..ttyUSB-5
-rw-rw-r– 5 openhab openhab 11 Aug 16 12:19 LCK..ttyUSB-55
-rw-rw-r– 5 openhab openhab 11 Aug 16 12:19 LK.000.188.000
-rw-rw-r– 5 openhab openhab 11 Aug 16 12:19 LK.000.188.001

 

Zoneminder Snapshot With openHAB Binding

When we upgraded to Fedora 28 on our server, ZoneMinder ceased working because some CakePHP function names could no longer be used. To resolve the issue, I ended up running a snapshot build of ZoneMinder that included a newer build of CakePHP. Version 1.31.45 instead of 1.30.4-7 on the repository.

All of our cameras showed up, and although the ZoneMinder folks seem to have a bug in their SQL query when building out the table of event counts on the main page (that is, all of my monitors have blank instead of event counts and my apache log is filled with

[Wed Aug 15 12:08:37.152933 2018] [php7:notice] [pid 32496] [client 10.5.5.234:14705] ERR [SQL-ERR 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'and E.MonitorId = '13' ),1,NULL)) as EventCount1, count(if(1 and (  and E.Monito' at line 1', statement was 'select count(if(1 and ( E.MonitorId = '13' ),1,NULL)) as EventCount0, count(if(1 and (  and E.MonitorId = '13' ),1,NULL)) as EventCount1, count(if(1 and (  and E.MonitorId = '13' ),1,NULL)) as EventCount2, count(if(1 and (  and E.MonitorId = '13' ),1,NULL)) as EventCount3, count(if(1 and (  and E.MonitorId = '13' ),1,NULL)) as EventCount4, count(if(1 and (  and E.MonitorId = '13' ),1,NULL)) as EventCount5 from Events as E where MonitorId = ?' params:13]

… it works.

Until Scott checked openHAB, where all of the items are offline. Apparently the openHAB ZoneMinder binding is using the cgi-bin stuff to get the value of ZM_PATH_ZMS. A config option which was removed from the database as part of the upgrade process.

Upgrading database to version 1.31.1
Loading config from DBNo option 'ZM_DIR_EVENTS' found, removing.
No option 'ZM_DIR_IMAGES' found, removing.
No option 'ZM_DIR_SOUNDS' found, removing.
No option 'ZM_FRAME_SOCKET_SIZE' found, removing.
No option 'ZM_OPT_FRAME_SERVER' found, removing.
No option 'ZM_PATH_ARP' found, removing.
No option 'ZM_PATH_LOGS' found, removing.
No option 'ZM_PATH_MAP' found, removing.
No option 'ZM_PATH_SOCKS' found, removing.
No option 'ZM_PATH_SWAP' found, removing.
No option 'ZM_PATH_ZMS' found, removing.
 207 entries
Saving config to DB 207 entries
Upgrading DB to 1.30.4 from 1.30.3

The calls from openHAB yield 404 errors in the access_log

10.0.0.5 - - [15/Aug/2018:09:38:04 -0400] "GET /zm/api/configs/view/ZM_PATH_ZMS.json HTTP/1.1" 404 1751 "-" "Jetty/9.3.21.v20170918"

 

Unfortunately they’ve changed the URL to get these values — it’s “munged” from the config file as the parameters are no longer stored to the Config table.
http://zoneminder.domain.ccTLD/zm/api/configs/view/ZM_PATH_ZMS.json
is now
http://zoneminder.domain.ccTLD/zm/api/configs/viewByName/ZM_PATH_ZMS.json

So … that’s a problem!

Running OpenHAB2 As Non-Root User — With USB

I’ll prefix this saga with the fact my sad story is implementation specific (i.e. relevant to those using Fedora, RHEL, or CentOS). I know Ubuntu has its own history with handling locks, and I’m sure other distros do as well. But I don’t know the history there, nor do I know how they currently manage locking.

We switched our openHAB installation to use a systemd unit file to run as a service and changed the execution to a non-root user. Since we knew the openhab service account needed to be a member of dialout and tty, and we’d set the account up properly, we expected everything would work beautifully.

Aaaand … neither ZWave for ZigBee came online. Not because it couldn’t access the USB devices, but because the non-root user could not lock the USB devices. From journalctl, we see LOTS of error messages that are not reflected in openHAB:

-- Logs begin at Sun 2017-04-30 14:28:12 EDT, end at Sun 2018-08-12 19:10:32 EDT. --
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyUSB-55: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: [34B blob data]
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyUSB-5: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: [34B blob data]
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyUSB1: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: [34B blob data]
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyUSB0: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: [34B blob data]
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS31: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: [34B blob data]
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS30: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS29: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS28: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS27: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS26: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS25: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: check_group_uucp(): error testing lock file creation Error details:Permission deniedcheck_lock_status: No permission to create lock fi>
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyS24: Permission denied. FAILED TO OPEN: No such file or directory
Aug 12 18:36:19 server.domain.ccTLD start.sh[7448]: testRead() Lock file failed

And now my old-school Linux/Unix knowledge totally screws me over — I expected a uucp group with write access to /run/lock. Except … there’s no such group. Evidently in RHEL 7.2, they started using a group named lock with permission to /var/lock to differentiate between serial devices (owned by uucp) and lock files. Nice bit of history, that, but Fedora and RedHat don’t do that anymore either.

Having a group with write permission was deemed a latent privilege escalation vulnerability, and they played around with having a lockdev binary writing files to /run/lock/lockdev, the creation and configuration of lockdev was moved into systemd, and then removed from systemd in favor of approaches [flock(), for instance].

RXTX has a hard-coded path based on OS version — that is what is used to create the lock file. And as the /run/lock folder is writable only by the owner, root … that is what is failing.

#if defined(__linux__)
/*
	This is a small hack to get mark and space parity working on older systems
	https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=147533
*/
#	if !defined(CMSPAR)
#		define CMSPAR 010000000000
#	endif /* CMSPAR */
#	
#	define DEVICEDIR "/dev/"
#	define LOCKDIR "/var/lock"
#	define LOCKFILEPREFIX "LCK.."
#	define FHS
#endif /* __linux__ */

Which is odd because I see a few threads about how nrjavaserial has been updated and as soon as the newer nrjavaserial gets bundled into the application, locking will all be sorted. And there’s an open issue for exactly the problem we are having … which explains why I’m not seeing something different in their source code. Digging around more, it looks like they didn’t actually change the hardcoded paths but rather added support for liblockdev. Which prompted my hypothesis that simply installing the lockdev package would magically sort the issue. It did not.

In the interim, though, we can just add write permission for /run/lock thorough the config file /usr/lib/tmpfiles.d/legacy.conf — the distro creates the lock directory owned by root:root. Original config lines:

d /run/lock 0755 root root -
L /var/lock - - - - ../run/lock

We can create the folder as owned by the lock group group and add group write permissions (realizing that creates the potential for privilege escalation attacks). Updated config lines:

#d /run/lock 0755 root root -
d /run/lock 0775 root lock -
L /var/lock - - - - ../run/lock

Adding the openhab account to the lock group allows the LCK.. files to be created.

[lisa@server run]# usermod -a -G lock openhab
[lisa@server run]# id openhab
uid=964(openhab) gid=963(openhab) groups=963(openhab),5(tty),18(dialout),54(lock)

Either reboot to reprocess legacy.conf or manually change the ownership & permissions on /run/lock. Either way, confirm that the changes are successful.

[lisa@server run]# chown root:lock /run/lock
[lisa@server run]# chmod g+w lock
[lisa@server lock]# ll /run | grep lock
drwxrwxr-x  7 root           lock             200 Aug 13 14:03 lock

If you manually set the permissions, restart openHAB. Our devices are online, and we have lock files:

[lisa@seerver lock]# ll
total 12
-rw-r--r-- 1 root root 22 Aug 10 15:35 asound.state.lock
drwx------ 2 root root 60 Aug 10 15:30 iscsi
-rw-r--r-- 1 openhab openhab 11 Aug 13 14:03 LCK..ttyUSB-5
-rw-r--r-- 1 openhab openhab 11 Aug 13 14:03 LCK..ttyUSB-55
drwxrwxr-x 2 root lock 40 Aug 10 15:30 lockdev
drwx------ 2 root root 40 Aug 10 15:30 lvm
drwxr-xr-x 2 root root 40 Aug 10 15:30 ppp
drwxr-xr-x 2 root root 40 Aug 10 15:30 subsys

 

Testing udev Rules

The first step of testing a udev rule is to determine the actual device you want to test. Get the info for the /dev/thing and find the real /device/path/…. (note, make sure you’re not in a “looking at parent” section — you want the one all the way at the top)

[lisa@linuxhost dev]# udevadm info -a /dev/ttyUSB0 | more
...
looking at device '/devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0':
...

Once you have the device, use udevadm in test mode and you will see the results from all of your udev rules. Including group & permission mask applied to the device.

[lisa@linuxhost dev]# udevadm test /devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0 | more
calling: test
version 238
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
...
47437 strings (385590 bytes), 43385 de-duplicated (347383 bytes), 4053 trie nodes used
PROGRAM 'usb_modeswitch --symlink-name /devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0 10c4 8a2a ' /usr/lib/udev/rules.d/40-usb_modeswitch.rules:10
starting 'usb_modeswitch --symlink-name /devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0 10c4 8a2a '
Process 'usb_modeswitch --symlink-name /devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0 10c4 8a2a ' succeeded.
GROUP 18 /usr/lib/udev/rules.d/50-udev-default.rules:25
IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/60-serial.rules:7
IMPORT builtin 'usb_id' /usr/lib/udev/rules.d/60-serial.rules:8
/sys/devices/pci0000:00/0000:00:02.0/usb2/2-1/2-1:1.0: if_class 255 protocol 0
IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/60-serial.rules:8
IMPORT builtin 'path_id' /usr/lib/udev/rules.d/60-serial.rules:15
LINK 'serial/by-path/pci-0000:00:02.0-usb-0:1:1.0-port0' /usr/lib/udev/rules.d/60-serial.rules:17
IMPORT builtin skip 'usb_id' /usr/lib/udev/rules.d/60-serial.rules:19
LINK 'serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_90F0016B-if00-port0' /usr/lib/udev/rules.d/60-serial.rules:24
GROUP 18 /etc/udev/rules.d/99-server.rules:5
MODE 0666 /etc/udev/rules.d/99-server.rules:5
LINK 'ttyUSB-5' /etc/udev/rules.d/99-server.rules:5
handling device node '/dev/ttyUSB0', devnum=c188:0, mode=0666, uid=0, gid=18
preserve permissions /dev/ttyUSB0, 020666, uid=0, gid=18
preserve already existing symlink '/dev/char/188:0' to '../ttyUSB0'
found 'c188:0' claiming '/run/udev/links/\x2fserial\x2fby-id\x2fusb-Silicon_Labs_HubZ_Smart_Home_Controller_90F0016B-if00-port0'
creating link '/dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_90F0016B-if00-port0' to '/dev/ttyUSB0'
preserve already existing symlink '/dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_90F0016B-if00-port0' to '../../ttyUSB0'
found 'c188:0' claiming '/run/udev/links/\x2fserial\x2fby-path\x2fpci-0000:00:02.0-usb-0:1:1.0-port0'
creating link '/dev/serial/by-path/pci-0000:00:02.0-usb-0:1:1.0-port0' to '/dev/ttyUSB0'
preserve already existing symlink '/dev/serial/by-path/pci-0000:00:02.0-usb-0:1:1.0-port0' to '../../ttyUSB0'
found 'c188:0' claiming '/run/udev/links/\x2fttyUSB-5'
creating link '/dev/ttyUSB-5' to '/dev/ttyUSB0'
preserve already existing symlink '/dev/ttyUSB-5' to 'ttyUSB0'
...

Or as a one-liner:

udevadm test `udevadm info -a /dev/ttyUSB0 | grep "looking at device" | sed "s/looking at device '//" | sed "s/'://"`

Zoneminder and PHP 7.2

After updating to php 7.2, ZoneMinder completely stopped working. Fortunately there were lots of entries in the error_log file

[Fri Aug 10 15:44:19.880809 2018] [php7:error] [pid 5293] [client 127.0.0.1:46958] PHP Fatal error:  Cannot use 'Object' as class name as it is reserved in /usr/share/zoneminder/www        /api/lib/Cake/Core/Object.php on line 30
[Fri Aug 10 15:44:19.889737 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/lib/zoneminder/templogs/cake_error.log) [function.file-put-contents]: failed to open stream: No such file or directory in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on         line 142
[Fri Aug 10 15:44:19.889850 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/log/zonemindererror.log) [function.file-put-contents]: failed to open stream: Permission denied in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on line 142
[Fri Aug 10 15:44:19.890176 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/lib/zoneminder/templogs/cake_error.log) [function.file-put-contents]: failed to open stream: No such file or directory in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on         line 142
[Fri Aug 10 15:44:19.890221 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/log/zonemindererror.log) [function.file-put-contents]: failed to open stream: Permission denied in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on line 142
[Fri Aug 10 15:44:19.890594 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/lib/zoneminder/templogs/cake_error.log) [function.file-put-contents]: failed to open stream: No such file or directory in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on         line 142
[Fri Aug 10 15:44:19.890637 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/log/zonemindererror.log) [function.file-put-contents]: failed to open stream: Permission denied in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on line 142
[Fri Aug 10 15:44:19.890960 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/lib/zoneminder/templogs/cake_error.log) [function.file-put-contents]: failed to open stream: No such file or directory in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on         line 142
[Fri Aug 10 15:44:19.891021 2018] [php7:warn] [pid 5293] [client 127.0.0.1:46958] PHP Warning:  file_put_contents(/var/log/zonemindererror.log) [function.file-put-contents]: failed to open stream: Permission denied in /usr/share/zoneminder/www/api/lib/Cake/Log/Engine/FileLog.php on line 142
[Fri Aug 10 15:44:19.892818 2018] [php7:error] [pid 5293] [client 127.0.0.1:46958] PHP Fatal error:  Uncaught Error: Class 'Controller' not found in /usr/share/zoneminder/www/api/li        b/Cake/Error/ExceptionRenderer.php:174\nStack trace:\n#0 /usr/share/zoneminder/www/api/lib/Cake/Error/ExceptionRenderer.php(92): ExceptionRenderer->_getController(Object(InternalErr        orException))\n#1 /usr/share/zoneminder/www/api/lib/Cake/Error/ErrorHandler.php(126): ExceptionRenderer->__construct(Object(InternalErrorException))\n#2 /usr/share/zoneminder/www/ap        i/lib/Cake/Error/ErrorHandler.php(284): ErrorHandler::handleException(Object(InternalErrorException))\n#3 /usr/share/zoneminder/www/api/lib/Cake/Error/ErrorHandler.php(213): ErrorHa        ndler::handleFatalError(64, 'Cannot use 'Obj...', '/usr/share/zone...', 30)\n#4 /usr/share/zoneminder/www/api/lib/Cake/Core/App.php(970): ErrorHandler::handleError(64, 'Cannot use '        Obj...', '/usr/share/zone...', 30, Array)\n#5 /usr/share/zoneminder/www/api/lib/Cake/Core/App.php(943): App::_checkFatalError()\n#6 [internal function]: App::shutdown()\n#7 {main}\n          thrown in /usr/share/zoneminder/www/api/lib/Cake/Error/ExceptionRenderer.php on line 174

Looks like CakePHP used class names that are now reserved words. Unfortunately, you cannot just drop the updated CakePHP files into ZoneMinder (I tried). Until the repository package is updated, you’ve got to build ZoneMinder from source. Or grab the testing RPM from the zoneminder.com repo. 1.31.45-1.13 works. Don’t forget to run “perl /usr/bin/zmupdate.pl” to update the database.

Then I had to throw the database connection into from the config file into /usr/share/zoneminder/www/api/app/Config/database.php (default array) because I do not use the default connection info.

Once I had the updated ZoneMinder along with the newer CakePHP that the ZM folks have in their repo … we’ve got ZoneMinder again.