We had a typical situation a few days back, where a production load went up and the CPU consumption was driven to almost 100%. The suspect was pointed out to be a mount point where weblogic was writing log files.
From the Solaris IOCTL data we could see that ZFS storage was not an issue. But as they say guilty until proven innocent. So we had to prove to the client that problems with ZFS storage did not contribute to the issue.
For this the systems engineers were asked to submit graphs which the business users could understand. The engineer went on to download the historical data in csv file, import them in excel and plot the graphs showing no relation.
Although Oracle engineered systems provide ZFS analytics, where you can see in detail the graphical analysis of the IO, being work from home days, the GUI was not accessible to our engineers via the VPN.
I was on my own python and data analysis journey since the last few months & thought that python code can simplify things here.
Given below is the python code to plot graphs from ZFS ioctl csv files and relate the various measurements.
Below code is from jupyter notebook environment
############






As can be seen from above char that most of the read calls to this mount point were finished in under 1 ms as denoted by the majority of orange lines.
Use this code and experiment with other historical data like oswatcher, exawatcher etc and do share it on your posts blogs....
Till then ... happy coding :)
From the Solaris IOCTL data we could see that ZFS storage was not an issue. But as they say guilty until proven innocent. So we had to prove to the client that problems with ZFS storage did not contribute to the issue.
For this the systems engineers were asked to submit graphs which the business users could understand. The engineer went on to download the historical data in csv file, import them in excel and plot the graphs showing no relation.
Although Oracle engineered systems provide ZFS analytics, where you can see in detail the graphical analysis of the IO, being work from home days, the GUI was not accessible to our engineers via the VPN.
I was on my own python and data analysis journey since the last few months & thought that python code can simplify things here.
Given below is the python code to plot graphs from ZFS ioctl csv files and relate the various measurements.
Below code is from jupyter notebook environment
############
#importing all
libraries needed , their sub libraries and assigning them aliasis
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#reading csv file and
converting into pandas dataframe
#C:\Users\pychamp\zfscsv
zfs_df =
pd.read_csv('C:\\Users\\pychamp\\zfscsv\\serv2.csv')
# calling head of
data frame to see few data lines. This line can be commented in your code
zfs_df.head()
# describing data
from the read csv file. This line can be commented in your code
zfs_df.info()
# using catplot to
plot average service time distribution,
# we can see from
below that during issue time, average service time was just 0.7
sns.catplot('asvct',kind='count',data=zfs_df,height=10)
# plotting %busy data
we can see that there is no bin with 1 or more so %busy was zero
sns.catplot('%b',kind='count',data=zfs_df,height=4)
#introducing new
column in data frame and rounding asvct and populating this column
# since the variance in the values of asvct is very high we use rounding to reduce the levels
# here 'servicetime' is absent in the original csv file but added to the Python Data Frame at Runtime
zfs_df['servicetime']= zfs_df.asvct.apply(np.round) -- applying numpy round function on asvct field
#plotting service
time now
# shows that most of
the service calls were finished in less than 1
sns.catplot('servicetime',kind='count',data=zfs_df,height=10)
#plotting relation
between service time and kilobytes read
sns.catplot('kbr',kind='count',data=zfs_df,hue='servicetime',height=10)
As can be seen from above char that most of the read calls to this mount point were finished in under 1 ms as denoted by the majority of orange lines.
Use this code and experiment with other historical data like oswatcher, exawatcher etc and do share it on your posts blogs....
Till then ... happy coding :)
No comments:
Post a Comment