Category: System Administration

Office 365 Activation Failure

We’ve been working to lock down our workstations … not “so secure you cannot use it”, but just this side of the functional/nonfunctional line. Everything went surprisingly well except I use the Office 365 suite for work. Periodically, it has to “phone home” and verify my work account is still valid. And that didn’t seem to go through the proxy well. The authentication screen would pop up and immediately throw an error:

No internet connection. Please check your network settings and try again [2604]

I spent a whole bunch of time playing around with the firewall rules, the proxy rules … and finally went so far as to just turn off the firewall and remove the proxy. And it still didn’t work. Which was nice because it means I didn’t break it … but also meant it was going to be a lot harder to fix!

Finally found the culprit — a new Windows installation, for some reason, uses really old SSL/TLS versions. Turned on 1.2 and, voila, I’ve got a sign-on screen. Sigh! Turned the firewall & proxy back on, and everything works beautifully. I think I’m going to add these settings to the domain policy so I don’t have to configure this silliness every time.

Firewall Settings: Local Network Access Plus Skype

I’m playing around with blocking all outbound connections on our computers and run most traffic through a proxy … Skype, however, won’t make voice/video calls with the HTTPS proxy set. We had to add a lot of subnets to the ruleset before the called party would get a ring. But it finally worked. This is the NFT ruleset, but I’ve got the same subnets added to the Windows Firewall too.

table inet filter {
        chain WIFI-FILTERONLYLOCAL {
                type filter hook output priority filter; policy accept;
                ip protocol tcp ip daddr 10.0.0.0/8 accept
                ip protocol udp ip daddr 10.0.0.0/8 accept
                ip protocol tcp ip daddr 13.64.0.0/11 accept
                ip protocol tcp ip daddr 13.96.0.0/13 accept
                ip protocol tcp ip daddr 13.104.0.0/14 accept
                ip protocol tcp ip daddr 13.107.0.0/16 accept
                ip protocol tcp ip daddr 13.107.6.171/32 accept
                ip protocol tcp ip daddr 13.107.18.15/32 accept
                ip protocol tcp ip daddr 13.107.140.6/32 accept
                ip protocol tcp ip daddr 20.20.32.0/19 accept
                ip protocol tcp ip daddr 20.180.0.0/14 accept
                ip protocol tcp ip daddr 20.184.0.0/13 accept
                ip protocol tcp ip daddr 20.190.128.0/18 accept
                ip protocol tcp ip daddr 20.192.0.0/10 accept
                ip protocol tcp ip daddr 20.202.0.0/16 accept
                ip protocol udp ip daddr 20.202.0.0/16 accept
                ip protocol tcp ip daddr 20.231.128.0/19 accept
                ip protocol tcp ip daddr 40.126.0.0/18 accept
                ip protocol tcp ip daddr 51.105.0.0/16 accept
                ip protocol tcp ip daddr 51.116.0.0/16 accept
                ip protocol tcp ip daddr 52.108.0.0/14 accept
                ip protocol tcp ip daddr 52.112.0.0/14 accept
                ip protocol tcp ip daddr 52.138.0.0/16 accept
                ip protocol udp ip daddr 52.138.0.0/16 accept
                ip protocol tcp ip daddr 52.145.0.0/16 accept
                ip protocol tcp ip daddr 52.146.0.0/15 accept
                ip protocol tcp ip daddr 52.148.0.0/14 accept
                ip protocol tcp ip daddr 52.152.0.0/13 accept
                ip protocol tcp ip daddr 52.160.0.0/11 accept
                ip protocol tcp ip daddr 52.244.37.168/32 accept
                ip protocol tcp ip daddr 138.91.0.0/16 accept
                ip protocol udp ip daddr 138.91.0.0/16 accept
                ip protocol icmp accept
                ip protocol udp ct state { established, related } accept
                limit rate over 1/second log prefix "FILTERONLYLOCAL: "
                drop
        }
}

Python Script: Alert for pending SAML IdP Certificate Expiry

I got a rather last minute notice from our security department that the SSL certificate used in the IdP partnership between my application and their identity provider would be expiring soon and did I want to renew it Monday, Tuesday, or Wednesday. Being that this was Friday afternoon … “none of the above” would have been my preference to avoid filing the “emergency change” paperwork, but Wednesday was the least bad of the three options. Of course, an emergency requires paperwork as to why you didn’t plan two weeks in advance. And how you’ll do better next time.

Sometimes that is a bit of a stretch — next time someone is working on the electrical system and drops a half-inch metal plate into the building wiring, I’m probably still going to have a problem when the power drops. But, in this case, there are two perfectly rational solutions. One, of course, would be that the people planning the certificate renewals start contacting partner applications more promptly. But that’s not within my purview. The thing I can do is watch the metadata on the identity provider and tell myself when the certificates will be expiring soon.

So I now have a little python script that has a list of all of our SAML-authenticated applications. It pulls the metadata from PingID, loads the X509 certificate, checks how far in the future the expiry date is. In my production version, anything < 30 days sends an e-mail alert. Next time, we can contact security ahead of time, find out when they’re planning on doing the renewal, and get the change request approved well in advance.

import requests
import xml.etree.ElementTree as ET
from cryptography import x509
from cryptography.hazmat.backends import default_backend
from datetime import datetime, date

strIDPMetadataURLBase = 'https://login.example.com/pf/federation_metadata.ping?PartnerSpId='
listSPIDs = ["https://tableau.example.com", "https://email.example.com", "https://internal.example.com", "https://salestool.example.com"]

for strSPID in listSPIDs:
    objResults = requests.get(f"{strIDPMetadataURLBase}{strSPID}")
    if objResults.status_code == 200:
        try:
            root = ET.fromstring(objResults.text)

            for objX509Cert in root.findall("./{urn:oasis:names:tc:SAML:2.0:metadata}IDPSSODescriptor/{urn:oasis:names:tc:SAML:2.0:metadata}KeyDescriptor/{http://www.w3.org/2000/09/xmldsig#}KeyInfo/{http://www.w3.org/2000/09/xmldsig#}X509Data/{http://www.w3.org/2000/09/xmldsig#}X509Certificate"):
                strX509Cert = f"-----BEGIN CERTIFICATE-----\n{objX509Cert.text}\n-----END CERTIFICATE-----"

                cert = x509.load_pem_x509_certificate(bytes(strX509Cert,'utf8'), default_backend())
                iDaysUntilExpiry = cert.not_valid_after - datetime.today()
                print(f"{strSPID}\t{iDaysUntilExpiry.days}")
        except:
            print(f"{strSPID}\tFailed to decode X509 Certficate")
    else:
        print(f"{strSPID}\tFailed to retrieve metadata XML")

Fedora 39: Load Balancing Across Two Network Connections

I think this is one of those things that people don’t normally do at home, and the folks who configure this in enterprises know what they’re doing and don’t need guidance on how to do basic network things. But … we wanted to have two network cards in our server so high network traffic usage like backups and TV recording don’t create contention. When I was a server admin, I’d set up link aggregation — bonding, teaming — and it just magically worked. We’d put in a port request to get the new port turned up, note it was going to be a teamed interface, do our OS config, and everything was fine. What the network guys did? I had no idea. Well, now I do!

On the switch — a Cisco 2960-S in this case — you need to create an EtherChannel and assign the ports to that channel. Telnet’ing to the switch, you first need to elevate your privileges as we start with level 1

wc2906s01>show priv
Current privilege level is 1

One you’ve entered privilege level 15, go into config term. Create the port-channel interface and assign it a number (I used 1, but 1 through 6 are options). Then go into each interface and add it to the port channel group you just created (again 1) — I set the mode to “on” because I doubt our server is going to negotiate PAgP and I didn’t want to get into setting up LACP.

enable 15
config term

interface Port-channel 1

interface GigabitEthernet1/0/13
channel-group 1 mode on

interface GigabitEthernet1/0/14
channel-group 1 mode on

# src-mac is the default, can change to something else
# e.g. src-dst-mac would be set using
# port-channel load-balance src-dst-mac
end

Done! Using show etherchannel summary confirms that this worked:

wc2906s01>show etherchannel summary
Flags: D - down P - bundled in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator

M - not in use, minimum links not met
u - unsuitable for bundling
w - waiting to be aggregated
d - default port

Number of channel-groups in use: 1
Number of aggregators: 1

Group Port-channel Protocol Ports
------+-------------+-----------+-----------------------------------------------
1 Po1(SU) - Gi1/0/13(P) Gi1/0/14(P)

Then you can configure a network bond in Fedora and add the physical interfaces. Since we’re using KVM/QEMU, there is a VMBridge bridge that contains the bond, and the bond joins two physical interfaces named enp10s2 and enp0s25

# VM Bridge configuration

[lisa@fedora39 /etc/NetworkManager/system-connections/]# cat vmbridge.nmconnection
[connection]
id=vmbridge
uuid=b2bca190-827b-4aa4-a4f5-95752525e5e5
type=bridge
interface-name=vmbridge
metered=2
timestamp=1708742580

[ethernet]

[bridge]
multicast-snooping=false
priority=1
stp=false

[ipv4]
address1=10.1.2.3/24,10.5.5.1
dns=10.1.2.200;10.1.2.199;
dns-search=example.com;
may-fail=false
method=manual

[ipv6]
addr-gen-mode=stable-privacy
method=disabled

[proxy]

 

# Bond configuration — master is the vmbridge, and the round robin load balancing option is used.
[lisa@fedora39 /etc/NetworkManager/system-connections/]# cat bond0.nmconnection
[connection]
id=bond0
uuid=15556a5e-55c5-4505-a5d5-a5c547b5155b
type=bond
interface-name=bond0
master=vmbridge
metered=2
slave-type=bridge
timestamp=1708742580

[bond]
downdelay=0
miimon=1
mode=balance-rr
updelay=0

[bridge-port]

# Finally two network interfaces that are mastered by bond2
[lisa@fedora39 /etc/NetworkManager/system-connections/]# cat enp0s25.nmconnection
[connection]
id=enp0s25
uuid=159535a5-65e5-45f5-a505-a53555958525
type=ethernet
interface-name=enp0s25
master=bond0
metered=2
slave-type=bond
timestamp=1708733538

[ethernet]
auto-negotiate=true
mac-address=55:65:D5:15:A5:25
wake-on-lan=32768

[lisa@fedora39 /etc/NetworkManager/system-connections/]# cat enp10s2.nmconnection
[connection]
id=enp10s2
uuid=158525f5-f5d5-4515-9525-55e515c585b5
type=ethernet
interface-name=enp10s2
master=bond0
metered=2
slave-type=bond
timestamp=1708733538

[ethernet]
auto-negotiate=true
mac-address=55:35:25:D5:45:B5
wake-on-lan=32768

 

Restart NetworkManager to bring everything online. Voila — two network interfaces joined together and connected to the switch. Check out the bond file under /proc/net/bonding to verify this side is working.

[lisa@fedora39 ~/]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.7.5-200.fc39.x86_64

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 1
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp0s25
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 55:65:d5:15:a5:25
Slave queue ID: 0

Slave Interface: enp10s2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 55:35:35:d5:45:b5
Slave queue ID: 0

OAUTH Authentication in Apache Airflow (2.8.1)

There’s a rather innocuous sounding bug in Apache Airflow that should be corrected in 2.8.2 — https://github.com/apache/airflow/pull/36538 — that means you absolutely cannot set up SSO using OAUTH with FabAirflowSecurityManagerOverride. Using the deprecated AirflowSecurityManager would work, manually updating your Apache Airflow code with the fix will work. But there’s no point in trying to set up SSO with the FabAirflowSecurityManagerOverride as your custom security manager — whatever lovely code you write won’t be invoked, you’ll get an error saying the username or email address is not present even though you thoughtfully wrote out some custom code to map out those exact attributes, and it all looks like it should be working!

Determining Active Directory Version

We have a number of applications that authenticate to Active Directory. Invariably, when there are authentication issues, the vendor support person asks “what version of AD is this?” … not an unreasonable question, but also not something the person who supports Application XYZ is apt to know in a larger company. Fortunately, there are a few places within the directory that you can find details about AD versions.

The simplest is the version of Windows the domain controllers are running … although it’s possible domain controllers have been upgraded but the AD functional level has not yet been changed.

ldapsearch -h ad.example.com -D "ldapquery@example.com" -w "P@s54LD@pQu3ry" -p389 -b "ou=domain controllers,dc=example,dc=com" "(&(objectClass=computer))" operatingSystem

CN=dc007,OU=Domain Controllers,dc=example,DC=com
operatingSystem=Windows Server 2019 Datacenter

CN=dc008,OU=Domain Controllers,dc=example,DC=com
operatingSystem=Windows Server 2019 Datacenter

CN=dc020,OU=Domain Controllers,dc=example,DC=com
operatingSystem=Windows Server 2019 Datacenter

CN=dc021,OU=Domain Controllers,dc=example,DC=com
operatingSystem=Windows Server 2019 Datacenter

 

You can also find the objectVersion of the schema:

ldapsearch -h ad.example.com -D "ldapquery@example.com" -w "P@s54LD@pQu3ry" -p389 -b "cn=schema,cn=configuration,dc=example,dc=com" "(&(objectVersion=*))" objectVersion

CN=Schema,CN=Configuration,dc=example,DC=com
objectVersion=88

What does 88 mean? It depends! Either Windows 2019 or 2022

Version Operating System
13 Windows 2000 Server
30 Windows Server 2003 (Before R2)
31 Windows Server 2003 R2
44 Windows Server 2008 (Before R2)
47 Windows Server 2008 R2
56 Windows Server 2012
69 Windows Server 2012 R2
87 Windows Server 2016
88 Windows Server 2019
88 Windows Server 2022

 

Or the functional level of the forest and its partitions:

ldapsearch -H ldap://ad.example.com -D "ldapquery@example.com" -w "P@s54LD@pQu3ry" -b "cn=partitions,cn=configuration,dc=example,dc=com" "(&(MSDS-Behavior-Version=*))" MSDS-Behavior-Version

dn: CN=Partitions,CN=Configuration,DC=example,DC=com
msDS-Behavior-Version: 7

dn: CN=EXAMPLE,CN=Partitions,CN=Configuration,DC=example,DC=com
msDS-Behavior-Version: 7

What does 7 mean? Well, that depends too. It’s either Windows 2016 or 2019!

msDS-Behavior-Version Forest
Domain Domain Controller
0 2000 2000 Mixed / Native 2000
1 2003 Interim 2003 Interim N/A
2 2003 2003 2003
3 2008 2008 2008
4 2008 R2 2008 R2 2008 R2
5 2012 2012 2012
6 2012 R2 2012 R2 2012 R2
7 2016 2016 2016
7 2019 2019 2019

 

Fedora: Finding Build Parameters for RPM

There have been a few times I’ve needed to make a custom build of an application — to enable some feature that the default build does not include or to use a newer version than is available in the package repository — and I always thought it would be really useful to know what the build parameters were. Turns out you can find how Fedora packages were build.

Go to https://koji.fedoraproject.org and search for the package

Locate ‘yours’ – the right version of the application and the right version of Fedora – and click on the package name

Scroll down to the “Logs” section – click on the “build.log” for the proper architecture

Here, you will see the entire log for building the RPM but part of that is building the application from source. You’ll be able to find the configuration and make parameters used in the build. As an example, I was trying to determine if Gerbera was build with the REUSEADDR flag (it was) and LIBUPNP disable-blocking-tcp-connections

https://kojipkgs.fedoraproject.org//packages/gerbera/2.0.0/1.fc39/data/logs/x86_64/build.log

In my particular case, I then had to find the libupnp package and see how that was built – they’ve got enable blocking tcp connections. Reusing the parameters from the RPM allows me to build packages that land files in “the right place” (or, rather, the place used in the Fedora package) and include any features they’ve included.

Tableau Error After Upgrading to 2023.3

I started upgrading our Tableau servers to 2023.3 this week. Several dashboards no longer rendered after the upgrade — throwing an error “TableauException: Incorrect data type real, getting expected integer type.” … resetting or not resetting the view did not help.

This is evidently a known issue (although the documentation prior to my reporting the issue seemed to go out of its way to say it is just the cloud platform being impacted)

https://issues.salesforce.com/issue/a028c00000iwtkXAAQ/~
https://kb.tableau.com/articles/Issue/error-incorrect-data-type-real-getting-expected-integer-type-occurs-intermittently-during-view-rendering

Both the server and desktop client are impacted — and, unlike their documentation that says it is intermittent? Not all workbooks are impacted, but the ones that are? A broken workbook is broken and will not render for anyone, anywhere, any time.

There is a workaround:

Server:
tsm configuration set -k features.EnableLogicalQueryBatchProcessor -v false
tsm pending-changes apply

Desktop:
“\Program Files\Tableau\Tableau 2023.3\bin\tableau.exe” -DOverride=EnableLogicalQueryBatchProcessor:off

ISC Bind 9.18 and Windows DNS

After upgrading all of our Linux hosts to Fedora 39, we are running ISC bind 9.18.21 … and it seems the ISC folks are finally done with Microsoft’s “kinda sorta RFC compliance”. Instead of just working around Windows DNS servers having some quirks … they now fail to AXFR the domain.

Fortunately, you can tell bind to stop doing edns ‘stuff‘ by adding a server{} section to named.conf — this gives the server some instructions on how to communicate with the listed server. When bind is no longer trying to do edns “stuff”, Windows doesn’t have an opportunity to provide a bad response, so the AXFR doesn’t fail.