Sunday, 12 July 2020

I/O Analysis with Python and Seaborn

We had a typical situation a few days back, where a production load went up and the CPU consumption was driven to almost 100%. The suspect was pointed out to be a mount point where weblogic was writing log files.

From the Solaris IOCTL data we could see that ZFS storage was not an issue. But as they say guilty until proven innocent. So we had to prove to the client that problems with ZFS storage did not contribute to the issue. 

For this the systems engineers were asked to submit graphs which the business users could understand. The engineer went on to download the historical data in csv file, import them in excel and plot the graphs showing no relation.

Although Oracle engineered systems provide ZFS analytics, where you can see in detail the graphical analysis of the IO, being work from home days, the GUI was not accessible to our engineers via the VPN.

I was on my own python and data analysis journey since the last few months & thought that python code can simplify things here.

Given below is the python code to plot graphs from ZFS ioctl csv files and relate the various measurements.

Below code is from jupyter notebook environment

############

#importing all libraries needed , their sub libraries and assigning them aliasis
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#reading csv file and converting into pandas dataframe
#C:\Users\pychamp\zfscsv

zfs_df = pd.read_csv('C:\\Users\\pychamp\\zfscsv\\serv2.csv')

# calling head of data frame to see few data lines. This line can be commented in your code

zfs_df.head()


# describing data from the read csv file. This line can be commented in your code

zfs_df.info()




# using catplot to plot average service time distribution,
# we can see from below that during issue time, average service time was just 0.7
sns.catplot('asvct',kind='count',data=zfs_df,height=10)



# plotting %busy data we can see that there is no bin with 1 or more so %busy was zero
sns.catplot('%b',kind='count',data=zfs_df,height=4)



#introducing new column in data frame and rounding asvct and populating this column
# since the variance in the values of asvct is very high we use rounding to reduce the levels
# here 'servicetime' is absent in the original csv file but added to the Python Data Frame at Runtime

zfs_df['servicetime']= zfs_df.asvct.apply(np.round)  -- applying numpy round function on asvct field




#plotting service time now
# shows that most of the service calls were finished in less than 1
sns.catplot('servicetime',kind='count',data=zfs_df,height=10)




#plotting relation between service time and kilobytes read
sns.catplot('kbr',kind='count',data=zfs_df,hue='servicetime',height=10)





As can be seen from above char that most of the read calls to this mount point were finished in under 1 ms as denoted by the majority of orange lines.

Use this code and experiment with other historical data like oswatcher, exawatcher etc and do share it on your posts blogs....

Till then ... happy coding :)

No comments:

Post a Comment