Connector block problems

I have a project that uses the ‘connector’ block to connect data between a sensor that makes its data available on port 7575 and an InfluxDB database. Everything has been working absolutely fine for months, then a couple of days ago after a restart the connector block fails to setup properly with the following error message:

12.03.23 20:04:52 (+0000)  connector  
12.03.23 20:04:52 (+0000)  connector  balenaBlocks connector version: 1.1.2
12.03.23 20:04:52 (+0000)  connector  Changing hostname to c9eb067
12.03.23 20:04:52 (+0000)  connector  Generating config
12.03.23 20:04:54 (+0000)  connector  balenablocks/connector
12.03.23 20:04:54 (+0000)  connector  ----------------------
12.03.23 20:04:54 (+0000)  connector  Intelligently connecting data sources with data sinks
12.03.23 20:04:55 (+0000)  connector  Traceback (most recent call last):
12.03.23 20:04:55 (+0000)  connector    File "/app/./autowire.py", line 96, in <module>
12.03.23 20:04:55 (+0000)  connector      config = autowire.GetConfig()
12.03.23 20:04:55 (+0000)  connector    File "/app/./autowire.py", line 62, in GetConfig
12.03.23 20:04:55 (+0000)  connector      services = self.GetServices()
12.03.23 20:04:55 (+0000)  connector    File "/app/./autowire.py", line 33, in GetServices
12.03.23 20:04:55 (+0000)  connector      device = self.balena.models.device.get_with_service_details(device_id, True)
12.03.23 20:04:55 (+0000)  connector    File "/root/.local/lib/python3.9/site-packages/balena/models/device.py", line 309, in get_with_service_details
12.03.23 20:04:55 (+0000)  connector      raw_data = self.base_request.request(
12.03.23 20:04:55 (+0000)  connector    File "/root/.local/lib/python3.9/site-packages/balena/base_request.py", line 197, in request
12.03.23 20:04:55 (+0000)  connector      raise exceptions.RequestError(response._content)
12.03.23 20:04:55 (+0000)  connector  balena.exceptions.RequestError: b''
12.03.23 20:04:56 (+0000) Service exited 'connector sha256:3652f9a3f1dc9dd7253dba82b9fac518f78ccd1c874d95c1c25cbc6d190f6768'

I just don’t know why this should happen. I’ve tried loads of things like a an earlier release, different Raspberry Pi4, earlier operating system version but nothing works. Here’s the important sections of my docker compose where you can see the sensor (HMT333) and the InfluxDB sections. I just can’t understand why this should happen when I haven’t changed anything to do with the HMT333 or InfluxDB sections:


  hmt333:
    privileged: true
    build: ./hmt333
    restart: always
    expose:
      - '7575'
    volumes:
      - 'settings:/data'

  influxdb:
    image: influxdb@sha256:73f876e0c3bd02900f829d4884f53fdfffd7098dd572406ba549eed955bf821f
    container_name: influxdb
    restart: always
    environment:
      - INFLUX_DATA_DIR=/data
      - PERSISTENT=1
    volumes:
      - 'sensor-data:/data'

  connector:
    image: balenablocks/connector:raspberrypi4-64
    restart: always
    labels:
      io.balena.features.balena-api: '1' # necessary to discover services
      io.balena.features.supervisor-api: 1  # necessary to discover services in local mode
    privileged: true # necessary to change container hostname
    depends_on:
      - influxdb
      - hmt333

I’ve tried creating a new device etc. but same problem. Note that the ‘connector’ section in my docker compose doesn’t point to ‘bh.cr/balenalabs/connector-aarch64’ as that still has a ‘ModuleNotFound’ issue open which still exists on the Raspberry Pi4.

I wonder if I’m correct in thinking that this error means the connector block is not getting a response from the sensor (HMT333)? I can see that data is being received from the sensor though. I do also have another device in this fleet which is running fine but it hasn’t rebooted for a long time. On that device I see this log message frequently:

12.03.23 20:31:40 (+0000)  hmt333  172.17.0.8 - - [12/Mar/2023 20:31:40] "GET / HTTP/1.1" 200 -

Which I think is the connector block getting data from the sensor, obviously I’m not seeing this now as the connector block isn’t starting up.

To test, I also setup this project for a Raspberry Pi3 as well - exactly the same error, though it says the version of connector is 1.1.6

The Python code snippet is what I use to serve my sensor data and I do get a log message to say the service has started up - this has always worked fine anyway:

  while True:
        server_address = ('', 7575)
        httpd = HTTPServer(server_address, HMT333http)
        logging.info('HMT333 sensor HTTP server running')
        httpd.serve_forever()

I see this in the GitHub README for connector (might be worth me increasing the timeout?)

The default timeout for retrieving data is 2 seconds. You can change this by setting INTERNAL_HTTP_TIMEOUT to the number of seconds (e.g. 4 )

Though this has never been a problem before.

Grateful for any ideas, I just don’t know what else to try now.

Thanks
Mark.

Update - tried setting the connector INTERNAL_HTTP_TIMEOUT to 10 but made no difference.

My other thought here is to not use the connector block at all and just create a Telegraph container and customised conf file to connect my sensor to InfluxDB, just so I can get things working again.

I have also just started to see this problem. Was there maybe an update to BalenaOS that has broken something?
I have been thinking of building my own version of the connector block, because while it is quite good for plug-and-play scenarios, it is quite hacky if you want anything non-standard. This might just push me to write my own service.

1 Like

Well at least I’m glad its not just me :joy:

I had updated the operating system a few weeks back but it had been working fine after this update. I also created a new device with an an older OS and still had the problem. Very strange, it seemed to start after a device reboot but the same device had been rebooted beforehand without the problem.

I understand the issue as follows:
When the connector block starts, the autowire.py script is run, which queries the balena host for a list of all the services that are running. It then uses this list of services to determine how to create the telegraf.conf file, which is used by telegraf to know what to listen to and what to broadcast to. [Incidentally , I have been able to manually inject configs into the telegraf.conf file generation in the autowire.py script in order to get telegraf to broadcast to a service that isn’t set up by the device. Anyway…]
The line in the autoconfig.py that is failing is the call to get a description of the device in order to extract the services that are running on the device (I think):

device = self.balena.models.device.get_with_service_details(device_id, True)

Then from this device object, the commit is extracted, which is used to get the release, which is used to get the list of services that are configured.

I am now trying to work out why the connectors request to get a description of the device is failing.

1 Like

So, indeed the GET request to the device is failing to return 200 OK, and is instead returning 500.

URL: device
Method: GET
endpoint: https://api.balena-cloud.com/v6/
params: None
data: None
raw query: $filter=uuid%20eq%20'ee719195c42fXXXXXXXXXXXXXXXXXXXX'&
$expand=image_install($select=id,download_progress,status,install_date&
$filter=status%20ne%20'deleted'&
$expand=image($select=id&$expand=is_a_build_of__service($select=id,service_name)),is_provided_by__release($select=id,commit)),gateway_download($select=id,download_progress,status&$filter=status%20ne%20'deleted'&$expand=image($select=id&$expand=is_a_build_of__service($select=id,service_name)))

The response is not response.ok. This raises the exception that we see above:

exceptions.RequestError(response._content)

Why is the device GET request returning 500? No idea. Unfortunately 500 doesn’t tell you much.

Other calls to the device are successful, for example, the call to check if the device is in local mode. So, it is something specific to the get_with_service_details request. :person_shrugging:

1 Like

I have the same problem raspberry pi 3.
Device was running for 4 months and then started with this error. I thought to update everything to the latest, OS, supervisor, etc.
Nothing worked. Pretty sure I’ve had this issue before and fixed it by creating a new device on Balena.io and pushing the project to it.
But this time, now that everything is updated, even after making a new device this error remains.

2 Likes

@lfarmvent thanks for walking through the issue in detail - I’ll see what I can find out about the failing get_with_service_details request.

1 Like

The get_with_service_details() appears to be fixed in v12.3.1 and greater of the balena Python SDK, but the Connector block is pinned to version 11.3.2 - we will plan to update that and re-publish the block.

3 Likes

@lfarmvent @mrcub The connector block has been updated and the error should be resolved for Raspberry Pi 3. (You’ll need to re-pull the image) However @markysparks the Pi 4 version still has the PluginBase error which I’ll look into further.

2 Likes

Thank you alanb128. Is now working for me.

The Pi 4 version should now be working as well.

2 Likes

Thanks @alanb128, appreciate the effort.

Great news - thanks @alanb128