Human eyes are designed to capture colors and patterns. We can quickly identify a red portion from a green one or a circle from a square. In this day and age where massive amounts of data are generated every single day, data visualization helps to grab our interest and keep our main focus on the message to make data-driven decisions. There are many data visualization techniques and tools out there like charts, graphs, or maps that provide an accessible way to identify trends, outliers, and patterns in data. Another popular data representation technique is a histogram that represents the estimation of the probability of distribution of a continuous variable. In this article, we will show you two different ways to create a SAS histogram. But first, let’s take a look at some of the most common types of data representations out there.
Types of Data Representations
1. Bar Chart
A bar chart visualizes data horizontally or vertically like amounts and frequency. It can be single bars or grouped bars. The most common use of bar charts is to compare different items. It is easy to say which types of data influence the other by looking at all the bars in the chart.
Fig: Bar chart (source)
2. Histogram
A histogram is similar to the appearance of a bar graph. However, there is a lot of difference between a histogram and a bar graph. A bar graph measures the frequency of categorical data (gender, months, marks, etc.) whereas a histogram is used for quantitative data (data about categorical variables).
Fig: Histogram (source)
3. Line Graph
A line graph uses lines and points to represent the change in time. They can be used to represent the increasing population of the world day by day, the number of animals left on earth, or the increasing number of data day by day, etc. It gives you an idea about the changes occurring across the world over time.
Fig: Graph of speed versus time (source)
4. Pie Chart
A pie chart is a circular statistic graphic used to represent the numerical proportion. They are often used to show percentages of a whole and the percentages at a set point in time. However, pie charts do not show changes over time, unlike other data representations. It can be replaced in most cases by other data representations like a bar chart, box plot, dot plot, etc.
Fig: Pie chart of populations of English native speakers (source)
5. Frequency Distribution Table
A frequency distribution table helps you summarize the value and the frequency of a chart. It usually has two or three columns. The first column lists all the various outcomes as individual values while the second column lists the frequency of each outcome of the data. This data representation gives you a snapshot of the data to help you to identify patterns.
Rank |
Degree of agreement |
Number |
1 |
Strongly agree |
23 |
2 |
Agree somewhat |
31 |
3 |
Not sure |
22 |
4 |
Disagree somewhat |
19 |
5 |
Strongly disagree |
16 |
6. Scatter Plot
A scatter plot is a type of plot or mathematical diagram that uses values from two variables plotted in a Cartesian plane. In case the points are coded, you can display one additional variable. The data is displayed as a collection of points - the value of one variable is determined by the position on the horizontal axis and the value of the other variable is determined by the position on the vertical axis.
Fig: Scatter Plots (source)
7. Box Plot
A box plot is a graphical representation of the locality, spread, and skewness of numerical data through their quartiles. In addition to the box on a box plot, box plots may also have lines extending vertically from the boxes or whiskers that indicate variability outside the upper and lower quartiles. The outliers that differ significantly from the rest of the dataset can be plotted as individual points.
Fig: Box plots (source)
SAS Histogram
A SAS histogram helps you explore your data by displaying the distribution of a continuous variable against categories of the value. It can be created using the PROC UNIVARIATE, PROC CHART, or PROC GCHART.
Syntax
The syntax to create a histogram in SAS is:
PROC UNIVARIATE DATA = DATASET; HISTOGRAM variables; RUN; |
Where “DATASET” is the name of the dataset used and “variables” are the values used to plot the histogram.
Simple Histogram
A simple histogram can be created by specifying the name of the variable and the range to group the values. In this example, we will take the minimum and maximum values of the variable “horsepower” and a range of 50. The values will form a group in steps of 50.
proc univariate data = sashelp.cars; histogram horsepower / midpoints = 176 to 350 by 50; run; |
By executing the code above, you will get the following output:
Histogram With Curve Fitting
We can also fit some distribution curves into the SAS histogram using some additional options. For example, let us fit a distribution curve with mean and standard deviation values mentioned as EST. This option uses an estimate of the parameters.
proc univariate data = sashelp.cars noprint; histogram horsepower / normal ( mu = est sigma = est color = blue w = 2.5 ) barlabel = percent midpoints = 70 to 550 by 50; run; |
By executing the code above, you will get the following output:
Related Topics
Are you considering a profession in the field of Data Science? Then get certified with the Data Science Certification today!
Want to Learn More?
One of the most important skills required for anyone in the data science field is data visualization. In this article, we have discussed the most common types of data representations and two different ways to design a SAS histogram. In order to create better histograms, SAS has a repository of text styles, colors, and lots of other options that can be added to the histogram for better readability. If you want to dive deep into this subject, you can check out Simplilearn’s Data Scientist Master's Program designed in collaboration with IBM. It features exclusive IBM hackathons, masterclasses, Ask-me-anything sessions, live interaction with practitioners, practical labs, and projects. Get started with this course today and accelerate your career in data science.