monitoring on cloudfoundry

Introduction

When mentioned monitoring on cloudfoundry, there are serveral levels to take. For sure, we have to monitor each component of cloudfoundry. Then, we should think about the machine runtime info which surpports cloudfoundry. And last, we also have to take into consideration about monitoring all applications running on cloudfoundry. It can be classified like that:

1. machine monitoring : memory usage / CPU utilization / disk size

2. component monitoring : component status / app statisics / user statisics

3. application monitoring : app status / app instances / app bandwidth

This chapter, we will talk about how we doing components monitoring on cloudfoundry. First, we will introduce what info consists of components info and try to classify it. Second, we will introduce some methods we are using to fetch components info. Last, we will see the future of components monitoring on cloudfoundry.

Components Info for Monitoring

As we all know, each architecture has many info, which contains initiate info, runtime info, input info(valid), output info and expection info. But not all of them are useful for monitoring. So what do we operators really care about? Let’s take an example, when you have a cloudfoundry running, what does your manager ask you most. Here may be those high frequency questions:

1. Does this cloudfoundry running well? (each component's status) What is the structure?(location and relationship of each component)

2. How many users are using our cloudfoundry and who are they?(user info)

3. How many apps running on cloudfoundry?(app info)

Based on those questions we can pick up those useful info and classify them. The total map is just like that:

|-- Components

    |-- basic info

        |-- name

        |-- access uri

        |-- status

        |-- memory usage

        |-- start time

        |-- up time

    |-- Cloud Controllers

    |-- Health Managers

    |-- Service Gateways

        |-- Supported Versions

        |-- Provisioned Services

        |-- Nodes Info

    |-- Routers

        |-- Droplets

        |-- Requests

        |-- Bad Requests

        |-- 2XX Response

    |-- Droplet Execution Agents

        |-- App numbers

        |-- App Memory

    |-- Applications

        |-- Buildpack

        |-- instance

        |-- space name

    |-- Users

        |-- email

        |-- username

        |-- organization name

    |-- Spaces

        |-- space name

        |-- organization name

        |-- running apps

        |-- stoped apps

    |-- Logs

        |-- log path

        |-- size

        |-- content

Methods to Fetch Components Info

There are serveral ways to fetch components info. First, we can fetch each component basic info via varz mechanism, such as component name, access uri, status and so on.
Second, we can connect to nats server and subcribe on specific channel to get requested info. And last, we can fetch info from database.

Here is some definitaions before we talking about the details, if you already know them, just ignore it:

The Nats : Message queue used for cf components communication and registration.

The varz interfaces : REST interface exposing all metrics of a cf component.

The DashBoard : it will get data when initiated and display it on a web interface through graphs.

Varz Interface

The varz infterface is generated by vcap::component module, and it can be overwrited in a specific module, such as health_manager.

Here is the source code of component: click

In this file, we first initialize a thin server to provide http protocol on two endpoint /varz and /healthz. Then, we will initialize a Varz object and invoke a function update_varz and it will get a dup of safe hash varz. Safe Hash is a very important structure, which provides thread safety and only one varz for one component.

We can use VCAP::Component.varz[XX] = XXX to set the value of safe hash varz.

Nats Mechanism

Using nats server we can subcribe channels to get many info among components. Such as request vcap.component.discover channel, we can get the address of accessing each components’ varz.

Here is some nats channel address.

click

From this link, you can find all the nats channel and message sequence in cloudfoundry V1 and V2.

Database Accessing

For database accessing, we need one important thing, the database access address, usually it is like this:

postgres://username:password@10.1.59.185/cloud_controller

But it is not easy to get this info as security privacy.

After a batch of detection, we find we can get those info via analysing varz info. Next chapter, we will talk about more details.

Then, we can use the database address to fetch the right info easily.

Health Manager Enhencement

This is an example we made on health manager V2. Beacause when we monitoring on cloudfoundry V1, we could get user info from health manager varz endpoint, but when convert into cloud foundry V2, we find that there is no more user info from varz endpoint. So we decide to make a health manager enhencement to expose user info (such as username, email) via varz endpoint.

Here is the procedure:

For health manager enhencement, we add a file named user_info.rb just in lib directory of health manager project. And we add three lines in health_manager.rb, which is :

* Line27 require 'health_manager/user_info.rb'

* Line64 @user_info = UserInfo.new(@varz)

* Line106 @user_info.start

In lib/user_info.rb file, we wrote many small function to implement fetch user info and insert it into varz safe hash. Here is the details:

First, we connect nats server and request vcap.component.discover channel, and receive all the response, decode it, then we can get address of accessing each components’ varz.

NATS.request("vcap.component.discover")

Second, we will access each varz endpoint via http request to fetch monitoring info. As we want get the database address, we just pick out uaa varz response using regular expression and find uaa database access address.

db = result[:data]["config"]["uaa"]["object"]["database"]
db_ip_port = /^.*(\d+\.\d+\.\d+\.\d+:\d+).*$/.match(db["url"])[1]

return "postgres://"+ db["username"] + ":" + db["password"]+ "@" + db_ip_port + "/uaa"

Third, we using DataMapper, which is a database access tools to fetch user info.

DataMapper.setup(:default, @@uaa_db)
users = Users.all(:fields => [:username, :email])

Here is a link related to DataMapper document. Click

Last, we using VCAP::Component.varz.synchronize method to insert user info into varz object.

VCAP::Component.varz.synchronize do
  VCAP::Component.varz[:users] = result
end

Conslusion

In conslusion, In this paper we have discussed monitoring on cloudfoundry. We focused on components monitoring and describe different types of info. Then, we talked the mechanism of fetching those info from cloudfoundry, tell a story about updating health_manager’s source code to fetch the request info.