Cloud Infrastructure Tutorial

3.1 Infrastructure

Hello and welcome to the module of Infrastructure of the CompTIA Cloud Plus course offered by Simplilearn. In this module we will look into the infrastructure of cloud computing. In the next slide, we will look into the objectives of this module.

3.2 Objectives

By the end of this module, you will be able to: Describe infrastructure and its components Explain the concepts of storage configuration and provisioning Identify tools for troubleshooting basic network connectivity Issues Identify the common network protocols and types of networks Let us begin by understanding the concept of cloud infrastructure, in the next slide.

3.3 Cloud Infrastructure

It is the different technologies used by cloud service providers to provide a seamless service. Virtualization, discussed previously, is a part of the physical infrastructure of cloud computing. Additionally, the other components of cloud infrastructure are: Storage technologies, Access protocols, File systems, Network protocols and port numbers, and Common hardware resources and features. In the next slide, let us find out why online storage is preferred over the conventional storage methods.

3.4 Conventional vs. Cloud Storage Technology

Cloud storage technology is preferred over its more conventional counterpart. The reasons mainly can be categorized into affordability, storage location, usability, and encryption. Affordability: As the name suggests, an online storage facility is more affordable as the space available for storage is more compared to the conventional storage available at the same price. Storage location: The conventional storage location is less secure when compared to the online storage. Usability: The usability is similar in both the technologies as there is no extra technological knowledge required for the use of online storage technology. Encryption: This is a layer of security used for data protection. While, this is rarely used in the conventional storage methods, it is widely used in the online storage technologies. We will begin with an interaction to describe the different storage technologies in the following slide.

3.5 Different Storage Technologies

The different storage technologies are: Direct Attached Storage (DAS), Network-Attached Storage (NAS), and Storage Area Network (SAN). Click each tab to learn more about each technology. Direct-attached storage (DAS) is computer storage that is connected to one computer and not accessible to other computers. For an individual computer user, the hard drive is the usual form of direct-attached storage. Although SAN and NAS address the storage issue by providing networked storage, surprisingly, DAS is still a popular method of attaching storage to a server. Especially after recent developments in attached storage, where in 2013, EMC2 announced a PCIe-based solid-state storage product for enterprise storage. Networked storage servers use this device to store data locally. This server can be setup on dedicated OS or on Windows OS. This type of setup is also used for imokem. Imokem refers to the media server, where media files like video, audio, or image files are stored. This setup can only perform streaming, and is incapable of performing and initiating file downloads. Example for DAS is the local hard disk that is being used in your system. The table here shows the advantages and disadvantages of Direct-Attached Storage or DAS. The various advantages are: It is local and non-networked storage installed in, or attached directly to the PC or the server. The block-level access provided by DAS means users have access to raw space and not a file system. DAS allows exclusive storage access to the server it is attached, and results in high availability and access rate. Disadvantages of DAS are: it has locally attached storage which cannot be shared. If one server runs out of disk capacity, and another has spare, storage cannot be balanced between them. Each server’s DAS is independent. This complicates data protection as data has to be copied across the network or backed up locally in each server with an application. Network-Attached Storage (NAS) refers to a computer device dedicated exclusively to shared file storage. NAS is a hard disk storage device with its own network address and is used by other devices on the network for file-based data storage. The specialty of NAS is it provides file level access to the users. An NAS device is given an IP address on a LAN. Files and application program on the NAS are served faster as they do not compete with other functions for processor resources. NAS devices do not have a keyboard or a display and are configured with a web browser. NAS can be setup on a dedicated OS or on Windows OS. The hardware of network-attached storage includes multi-disk RAID systems. NAS software can handle different network protocols including: Internetwork Packet Exchange (Microsoft), Netware Internetwork Packet Exchange (Novell), and Network File System (Sun Microsystems). While NAS can run a standard OS like Windows, NAS platforms of certain organizations run their own proprietary OS. For example, Data ONTAP is used by NetApp’s NAS platforms. NAS management utilities are the control panels that allow software administrators to configure and manage the NAS. These utilities help in managing multiple and heterogeneous (from different vendors) NAS boxes. This eases the burden on an administrator when more storage is added to the infrastructure. Moving a level up from NAS, SAN is a more sophisticated storage system. Storage Area Network (SAN) is an entire high-speed shared network dedicated to storage. SAN gives users block level access, by interconnecting different storage devices with their associated data servers for a large user network. Here, the storage device appears to be attached to the OS. Sub-networks of NAS systems can be incorporated in an SAN. For an enterprise, SANs are generally part of the network, connecting all computing resources. SAN can either be: Clustered near other resources such as IBMz990 mainframes, or serve as backup and archival storage in a remote location. The latter is achieved with the help of WAN carrier technologies such as ATM or SONET. Communication technology used in SANs can be: Existing technology such as IBM’s ESCON, or Fibre Channel. SAN has been compared to the common storage bus that is shared by different storage devices on a computer. Client systems require a special hardware. It uses Host Bus Adapter (HBA) to perform interaction with the SAN box. An HBA is a Peripheral Component Interconnect (PCI) add-on card that is connected to PCI bus of the host system to connect to the SAN. Each HBA has a unique World Wide Name (WWN), which is an 8-byte identifier used to identify the system. SAN supports Logical Unit Number (LUN) masking and zoning. Zoning can control the following: access from one node to another, isolation of a single server to either a single storage device or a group of storage devices, or association of a set of servers with one or more storage devices. LUN masking provides detailed security than zoning because LUNs allow the sharing of storage at the port level. SANs help networks in: disk mirroring, backup and restoring, archiving and retrieval of archived data, data migrating from one storage device to another, and data sharing among network servers. Let us compare the three storage technologies in the next slide.

3.6 Storage Technologies Comparison

The table here compares the three storage technologies. The comparison is based on four properties; storage, type of storage, user access, and data sharing. In NAS, computer device is dedicated solely to shared file storage. In SAN, entire high-speed shared network is dedicated to storage. While in DAS, it is local and non-networked storage. NAS and SAN both provide network storage. DAS has a locally attached storage, which means it cannot be shared. The user access is on file level for NAS, network level for SAN, and DAS provides access to raw space. While NAS permits data sharing, in case of SAN it is dependent on the Operating System (OS). In DAS there is no data sharing as it is a local storage. We will look at an interaction to describe the different access protocols in the following slide.

3.7 Different Access Protocols

The different access protocols are: Fibre Channel, Ethernet, Internet Protocol, Fibre Channel over Ethernet, and internet Small Computer System Interface. In the next slide, we will discuss the first access protocol, Fibre Channel over Ethernet. Click each tab to learn more about each access protocol. Fibre channel can transmit data between devices at a rate of 4 Gbps. This speed is expected to increase to 10 Gbps in future. Fibre channel is considered the main competitor of Ethernet and is a popular technology for an SAN. It is gradually replacing the Small Computer System Interface (SCSI) and is now the favored interface between servers and clustered storage devices. The SCSI commands that ride atop the FC transport are sent via the Fibre Channel Protocol (FCP). In order to increase performance, this protocol takes advantage of hardware that can utilize Protocol Off-Load Engines (POEs). This assists the host by offloading processing cycles from the CPU, thereby improving system performance. The table here lists the advantages and disadvantages of fibre channel protocol. The major advantages are: It is three times faster than SCSI and is highly flexible when optical fibre is used as the physical medium. When optical fibres are used, it can travel a larger distance of ten kilometers or six miles. For short distances, it also works on coaxial cable and the ordinary twisted pair used in telephone cables, and therefore does not require optical fibre. FC offers point-to-point, switched, and loop interfaces. It is designed to interoperate with SCSI, the Internet Protocol (IP), and other protocols. However, the disadvantage of using FC is that manufacturers may interpret specifications differently and implement them accordingly. This leads to compatibility issues. Ethernet, invented by Robert Metcalfe, is a networking technology used in LANs. Ethernet was first deployed on a large scale in the 1980s. At that time, it supported a maximum theoretical data of 10 Megabits per second or Mbps. Subsequently, this increased to 100 Mbps under “Fast Ethernet” standards. Today, this figure can reach a peak of 1000 Mbps, a speed enabled by the Gigabit Ethernet technology. Components such as the Ethernet frames, structure, the physical layer, and the MAC operation corresponding to it, make up the Ethernet protocol also known as Ethernet. The run length of individual Ethernet cables is limited to roughly 100 meters, but Ethernet networks can be easily extended to link entire schools or office buildings using network bridge devices. A data packet on the Ethernet is called an Ethernet packet. Data travels over Ethernet inside the protocol units called frames. Higher level network protocols like Internet Protocol (IP) use Ethernet as their transmission medium. Internet Protocol operates at the network layer of the OSI model and provides unique addresses and traffic routing capabilities. The system using Internet protocol will be assigned a 32bit IP address. The management difference for individual protocols or devices are totally dependent on the requirement of the infrastructure. Fibre Channel over Ethernet (FCoE) eliminates the need for organizations to run parallel network infrastructure for their LAN and SAN. The FCoE standard protocol combines FC (used in SAN) and Ethernet protocol (used in TCP/IP networks). It allows FC traffic to use the already existent high-speed Ethernet infrastructure. The storage and IP protocols of FC and Ethernet merge and use the same cables and interfaces. FCoE is aimed at: consolidating I/O; reducing switch complexity; and drastically reducing the numbers of cables, interface cards, switches, and adapters. FCoE combines Ethernet and FC by: using a lossless Ethernet fabric and its own frame format; retaining FCs device communication but substituting FC links between devices with high-speed Ethernet links; and working with standard Ethernet cards, cables, and switches to handle FC traffic at the data link layer. For example, FCoE encapsulates, routes, and transports FC frames across an Ethernet network using Ethernet frames. This is done from switches equipped with FC ports and attached devices to other similarly equipped switches. Under FCoE, organizations can have either: FC and Ethernet traffic on the same cable or, FC and Ethernet traffic on different cables on the same hardware. However, organizations have been slow to adopt FCoE as end-to-end FCoE devices are not easily available. Additionally, they may be reluctant to change their existing network infrastructure and management. The internet Small Computer System Interface (iSCSI) protocol enables the transfer of SCSI packets over a TCP/IP (Ethernet) network. Small Computer Systems Interface (SCSI) has been a standard client-server protocol for decades. It is used to enable computers to communicate with storage devices. As interconnects of system move from the classical bus to a network structure, the SCSI has to be mapped to network transport protocols. Today's IP Gigabit networks meet the performance requirements, and seamlessly transport SCSI commands between application servers to centralized storage. iSCSI is an interoperable solution which enables the use of existing TCP/IP infrastructure and addresses distance limitations. It can also be used over the Internet. This means the disk drives in the SAN are presented over the existing Ethernet network, to server applications, as though the disks are local to the physical server hardware.

3.8 iSCSI Protocols Benefits

The iSCSI protocol provides numerous benefits for SANs as compared to using Fibre Channel. Firstly, the iSCSI uses familiar networking standards, such as Ethernet and TCP/IP. Most IT administrators are already familiar with TCP/IP, unlike the more complex skills required for FC storage. The next benefit is that the total storage costs are reduced. iSCSI SANs are easier to install and maintain than Fibre Channels. It lowers the installation and maintenance expenses. iSCSI also reduces the necessity of hiring or outsourcing the storage administration. Yet another benefit is that the replication works over a standard IP network. iSCSI replication eliminates distance limitations and the costs associated with Fibre Channel routers. Lastly, the iSCSI scales up to 10 Gigabit. For enterprise applications that require high transactional performance,10 Gigabit Ethernet, also called as 10 GigE is available, thus expanding iSCSI storage networks performance to equal the performance of Metro and Wide Area Networks, also referred to as WAN. Gigabit Ethernet (GigE) is the fastest Ethernet configuration available in the market with ranging speed of 1 Gbps to 100 Gbps. We will move on to Storage Configuration and Provisioning, in the next slide.

3.9 Redundant Array of Independent Disks

Redundant Array of Independent Disks (RAID) is a type of storage system usually adopted by small and medium enterprises. RAID uses more than two disk drives to make it logically one disk drive. RAID provides independent features like fault tolerance, integrity of data, and data redundancy by duplicating the stored data in multiple disks. Thus, it provides a balanced way for recovery and performance. RAID is categorized into different levels. They are: level 0 which is a Striped Disk Array without Fault Tolerance. Level 1 is Mirroring and Duplexing, level 2 is Error-Correcting Coding. Level 3 is Bit-Interleaved Parity, level 4 is Dedicated Parity Drive, level 5 is Block Interleaved Distributed Parity. Level 6 is Independent Data Disk with Double Parity, level 0+1 is a mirror of stripes, and finally, level 1+0 is a stripe of mirrors. Let us discuss the RAID levels individually beginning with level 0 in the next slide.

3.10 RAID Level 0

Level 0 has a higher data throughput due to the striped data across drives. This level has a very good performance due to non-storage of redundant data information. However, there is a data loss in case of a disk failure in the array. A single drive failure will result in the loss of all data. Striping is the common name given to this level. The ‘read’ and ‘write’ performance offered in RAID level 0 is very high. It is easy to implement. Let us continue our discussion of this level with a figure in the next slide.

3.11 RAID Level 0(contd.)

Let us understand the working of this level by considering a scenario, as depicted in the figure on the slide. In this level, there is a server which is responsible for accepting the data and storing it in the RAID disk. RAID controller is generally used to stripe the data and multiple datastore. Here, the data stores refer to the hard drive in which the data is to be stored. The data stream is accepted by the RAID controller; which in turn converts the data into multiple stripes. Here, stripe refers to a data chunk. In the figure, considering the stripes A, B, C, D, and so on, it is observed how the data is stored in this level. Stripe A is to disk 1, stripe B to disk 2, stripe C to disk 3, stripe D to disk 4, and so on. Let’s learn about RAID level 1, that is, mirroring and duplexing in the next slide.

3.12 RAID Level 1 Advantages and Disadvantage

Redundancy is provided in this level as the data is written in multiple drives. In case of more than one drive, ‘read’ operations are faster and ‘write’ operation is slower than a single drive. However, there is no data loss if either single or multiple drive fails. This level requires two drives which makes it a good entry-level redundant system. Per megabyte cost is high here as a single drive is used to store the duplicate data. Mirroring is a commonly referred name for this level. The table on the slide lists the advantages and the disadvantage of this level. The various advantages of this level are: no parity generation, easy implementation, extreme fault tolerance, and utilization of full disk capacity. The only disadvantage of this level is the inefficient use of disk space as it doubles the write operations of the data. The parity generation is a step to check and correct any errors encountered in the read-write operation from/to disk drives. Let us continue our discussion of this level with a figure in the next slide.

3.13 RAID Level 1

The setup depicted in the figure on the slide is similar to RAID 0 with difference of maintaining the data. Here in RAID 1, the setup takes up the responsibility to make mirror disks from the collections of data store. Once the primary and secondary data store is confirmed, the stripes will be stored in the primary and secondary data store simultaneously. This is shown in the figure, where the data stripe A is mirrored to data stripe B. The procedure, in which data-write-operation is performed in primary and secondary disks simultaneously, is termed as multiplexing. Let us learn about RAID level 2, that is, error-correcting coding in the next slide.

3.14 RAID Level 2

The space efficiency of RAID level 2 is higher than level 1 but lower than other levels. In this level hamming code is used which is a complicated algorithm. Hamming code is used to replace the usage of a simple parity for data validation. The disk space occupied by a hamming code is more as it is larger than parity. However, it can recover data loss from multiple drives if the code design is proper. This level is only capable of data retention in case of multiple drive failure. RAID-2 maintains two components: data store collections and Error Correcting Code (ECC) memory. The ECC memory is immune to errors and is a type of disk storage used to detect any type of data corruption during write operation. The disadvantage of this level is CPU power required for a hamming code generation is more than parity generation. Let us continue our discussion of this level with a figure in the next slide.

3.15 RAID Level 2(contd.)

In the figure on the slide, the individual word of data is stored in stripes. For example, word A is stored in the form of A0 to A3 stripes. When the data is stored, the error correcting code is generated simultaneously, and is stored in ECC memory. In this case, for data word A, the ECC for word A is ECC/Ax to ECC/Az. The penalties of a RAID-4 array is present in a RAID-2 array. However, in a RAID-2 array the penalty for write performance is larger. Write performance penalty refers to the time period required to write and commit a specific data. The hamming code cannot be updated on a regular basis which leads to the increased write performance penalty at RAID-2. Data for new hamming code is generated using all the write modified data blocks in the stripe. These data blocks must be read in and then used to generate the code. When the write is larger, time taken by CPU for hamming code generation is higher than parity generation time. This can increase the time taken for larger writes. In the next slide, we will move on to RAID level 3, that is, bit-interleaved parity.

3.16 RAID Level 3

RAID-3, also known as Bit-Interleaved Parity, can be defined as byte or bit wise striping with parity. In this level, the array performance is increased significantly using spindle sync. This usage is possible as each operation occupies all the drivers. Spindle-sync is a concept that ensures all the disk spin at same speed in terms of rounds per minute. The array block size needs to be greater than the disk drive’s block size as a logical block is segregated to various physical blocks. For this, formatting of the disk drive occurs with less than 512 bytes block size. As a result, there are increasing number of drive block headers. This reduces the disk drive’s storage capacity marginally. The data drive number in RAID-3 configuration is limited to a power of two. The commonly used configuration of data drives are four or eight. RAID-3 implementation is claimed by certain disk controllers. However, they have segment size, the concept of which is incompatible with RAID-3. We will continue our discussion of this level in the next slide.

3.17 RAID Level 3(contd)

Irrespective of the request size of the I/O and access type (either read or write) required, all array drives are accessible to every I/O to the array. While the write operation takes place, on each data disk a portion of the individual block is stored by RAID-3. RAID-3 is also responsible for computing the data parity and writing it to the drive for parity. As shown in the figure on the slide, the data word A is stored in data store collection as A0, A1, A2, and A3. ‘A parity’ is generated using a parity generator during the storage operation. A similar operation is also performed for data words B, C, and D. The reliability level provided by RAID-3 is similar to RAID-4 and 5, however, the I/O bandwidth offered on small requests is greater in RAID level 3. In RAID-3, while writing, there is no impact on performance. In this level, each operation occupies all the drivers, as a result, multiple operations cannot be performed simultaneously on the array. An implementation with segment size claiming to be RAID level 3 can possibly be RAID-4. We will learn about RAID level 4, that is, dedicated parity drive in the next slide.

3.18 RAID Level 4

In RAID-4 there is a single drive parity storage and at the block level data is striped across multiple drives. Information on parity allows the retrieval from single drive failure. As shown in the figure on the slide, data A is striped at block level as A0 to A3. During this operation, the total parity of individual block is calculated and stored in the dedicated parity drive. Here, for data block A, the parity is ‘A parity’ for Block 0 to Block3. Similar to level 0, for read operations, RAID-4 array performance is very good. The data of parity has to be updated each time for write operations. As a result, the speed of smaller random writes are reduced, however, that of large or sequential writes are not significantly affected. RAID-4 array per megabyte cost is not high as the storage of redundant data occurs in a single drive in the array. We will learn about RAID level 5, that is, block interleaved distributed parity, in the next slide.

3.19 RAID Level 5

RAID-4 and 5 are similar except in RAID level 5 the parity is distributed among the drives. As the parity disk is not held up, the speed of the small writes in multiprocessing system is increased. The figure on the slide shows that for data block A, each stripe A0 to A3 is stored in a single physical disk; and the parity generated is distributed among the entire data store collection. The operation performed is similar for data blocks B, C, D, and E. The performance is lower here in read operation, as it skips the parity data whenever any parity information in the data store is encountered. However, during error correction, the read operation will be slower, as it is essential to get all the parity data to correct the data. The cost per megabyte is same for both Level 4 and level 5. Let us move on to RAID level6, that is, independent data disk with double parity in the next slide.

3.20 RAID Level 6

The levels 5 and 6 are similar, however, RAID-6 uses an additional independent parity scheme which permits extra fault tolerance. In this level, on a block level, across a set of drives, the data is striped. Calculation of a second parity set takes place which is written across all the drives. As shown in the figure on the slide, for extra fault tolerance, two parities are being generated using two different algorithms. In the figure, XOR (Parity Generation algorithm) is represented as ‘P’ and Reed-Solomon ECC code Generation is represented as ‘Q’. The process is similar to storage, however, the difference is that double parity will be generated and stored randomly. The fault tolerance in RAID-6 is high and simultaneous failures of multiple drives can be sustained. In the following slide, let us move on to RAID-0+1, also known as mirror of stripes.

3.21 RAID Level 0 + 1

For implementation, this drive uses minimum 4 drives. RAID 0+1 is a mirrored array whose drives are in RAID 5 array. The fault tolerance in RAID 0+1 and 5 is the same. Overhead for fault tolerance of RAID 0+1 is the same as mirroring alone. As shown in the figure on the slide, when data word A is to be stored in this setup, the RAID controller will stripe the data and pass it to the data store. This RAID level will ensure the data performs mirroring operation as depicted in the figure. In RAID 0+1 there are multiple stripe segments due to which the I/O rates achieved are higher. We will look into the last RAID level 1+0, that is, a stripe of mirrors in the next slide.

3.22 RAID Level 1 + 0

Implementation in this level requires a minimum of four drives. RAID 1+0 is implemented as a striped array whose segments are RAID 1 arrays. The fault tolerance in this level is the same as level 1 and the overheads are the same as mirroring alone. As shown in the figure on the slide, the data word ‘A’ is first striped in the RAID controller as ‘A0’ and ‘A1’. While the data stripe is stored in the data store, the mirroring operation takes place simultaneously. Similar operations are performed for data words ‘B’, ‘C’, and ‘D’. The striping of RAID 1 segments results in high speeds in reading and writing. This level is beneficial for those who want RAID 1, however, also require an extra boost in performance. In the next slide, we will look at the disk drive technology for storage configuration and provisioning.

3.23 Disk Drive Technology

The increasing demand of more storage space at less cost led to the advancement in disk drive technology. Selection of storage for different scenarios requires some parameters to be considered. Spinning disk like hard disk is a form of rotational media generally used for storage purposes. The interface types of hard disks are: Advanced Technology Attachment (ATA), Integrated Development Environment (IDE), Serial ATA (SATA), Small Computer System Interface (SCSI), Serial Attached Storage (SAS), and Fibre Channel (FC). The major differences between Solid State Drives (SSD) and other spinning media are access speed, noise of the drive, power consumption, and data transfer rate. If the rotational speed of Hard Disk Drive (HDD) is 3600 rpm (rounds per minute) then latency is 8.3 milliseconds. Similarly, for 4200 rpm, it is 7.1 ms, 7200 rpm it is 4.2 ms and 15000rpm it is 2ms. The noise of the drive is less in SSD compared to HDD. Power consumption is less in SSD and more in other spinning media. Data transfer rate is faster in SSD compared to other media. We will continue our discussion on disk drive technology in the next slide.

3.24 Disk Drive Technology Tiered storage

Tiered storage or ‘tiering’ permits an organization to store data on the basis of performance, availability, cost, and recovery requirements of an application. Tiered storage requirements can also be determined by functional differences, for example, the need for replication and high-speed restoration. Tiers 1, 2, 3, and 4 are the four tiers of storage. Tier 1 storage systems have better performance, capacity, reliability, and manageability. Tier 2 data runs major business applications, for example, e-mail and ERP. Tier 2 is a balance between cost and performance. Tier 2 data does not require sub-second response time but still needs to be reasonably fast. Tier 3 is used for data that is not accessed every day and therefore not stored in expensive tier 1 or tier 2 disks. Tier 4 data is used for compliance requirements for keeping e-mails or data for long periods of time. Tier 4 data can be a large amount of data but does not need to be instantly accessible. A multi-tiered storage system provides an automated way to move data between more and less expensive storage systems, as an organization can implement policies that define what data fits into each tier and then manage how that data migrates between the tiers. In the next slide we will look at logical unit numbers in storage configuration and provisioning.

3.25 Logical Unit Numbers

Logical Unit Numbers (LUNs) were originally used to identify SCSI devices. In storage networking, LUNs operate as unique identifiers. However, they are more likely to represent a virtual hard disk from a block of allocated storage within an NAS device or an SAN. Network shares are storage resources that are made available across the network and appear as resources on the local machine. Whereas zoning and LUN masking are configuration options that limit access to storage resources. Multipathing is a way of making data more available or fault tolerant to the computers that need to access it. Multipathing does exactly what its name suggests, it creates multiple paths for the computer to reach the storage resources it is attempting to contact. If an attempt is made to add storage capacity in the NAS or SAN box, the major implication is the downtime faced as the hardware resources cannot be scaled up in live mode. The downtime may result in loss of business or availability commitment. The impact of such operations may result in loss of data, availability, and SLA commitments resulting in a disaster. One of the best practices in storage is to always maintain multiple sites to ensure availability. Also implement RAID setup to ensure mirroring is done in different disks. Let us move on to the different file systems, in the next slide.

3.26 Different File Systems

The different types of file systems are: New Technology File System, File Allocation Table, Unix File System, Extended File System, Z File System, and Virtual Machine File System. Let us discuss the first file system, NTFS, in the next slide.

3.27 New Technology File System

New Technology File System (NTFS) is a standard of various Operating Systems (OS). They are: Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, Windows NT advanced server Network Operating System, and Windows 7. This file system substitutes File Allocation Table (FAT) for OS of the Microsoft Windows series. In several cases it has improved High Performance File System (HPFS) and FAT. Many additional and extra functions like file system journaling, and Access Control List (ACL) are provided by NTFS. The implementation of security in this file system takes place through directory and file. NTFS also provides recovery, data protection, and long file name. It supports the storage of files called volume in large and multiple hard disks. File subordinate relationship and access are controlled by the built-in security features provided by NTFS. DOS or other OS cannot directly access the NTFS partitions. A third-party software is required if users need to read and write NTFS partition files. NTFS partition can be read and written without any data loss using NTFS-3G. We will continue our discussion on file system types in the next slide.

3.28 File Allocation Table

Microsoft Corporation invented and partially owns the file system File Allocation Table (FAT) for MS-DOS usage. FAT is the file system of the complete non-NT core Microsoft windows too. This file system was simplified as the computer of the time had limited capability. Due to this simplicity, the OS of all personal computers can use FAT. This file system also finds its usage in OS data exchange. In FAT, when the file is deleted and replaced with new data, the sections of the new data are normally scattered, which reduces the read and write speed. This issue can be addressed by disk fragment reforming, however, to ensure the efficiency of FAT the recombination needs to be frequently used. Disk fragment reforming is the process of recovering the deleted data from the disk. Let us discuss Unix File System in the next slide.

3.29 Unix File System

Several UNIX and OS similar to UNIX use the Unix File System (UFS). The Berkeley Software Distribution Fast File System (BSD FFS) brings down the fragmentation that is caused when the content of a directory is scattered over a whole disk. UFS has been adopted by the vendors of a few proprietary UNIX systems, such as Solaris, System V Release 4, HP-UX, and Tru64 UNIX. Majority of the vendors adopted this system for their personal use. They added proprietary extensions unrecognizable by the versions of UNIX used by other vendors. There is some extent of compatibility remaining across the platforms as the original block size and data field widths continue to be used by several vendors. The primary benefit of this file system is the ability to share the data across multiple platforms. To perform data sharing, compatibility between implementations should be maintained. The compatibility depends on the concept of platform independence, which refers to independence on the type of hardware and operating system. The file system journaling was brought to UFS with Solaris 7 in which Sun Microsystems included UFS Logging. In the next slide we will look into the extended file system.

3.30 Extended File System

The table on the slide shows different versions of this file system and their characteristics. This file system was the first one created specifically for the Linux kernel and was implemented in April 1992. The metadata structure of the Extended File System (EXT) is inspired by the traditional UFS. Second Extended File System (EXT 2) was developed by Remy Card and introduced in 1993. The optimum individual file size can vary from 16 GB to 2 TB. The file size of the overall EXT 2 file system can range from 2 TB to 32 TB. Third extended file system (EXT 3) introduced in 2001 was developed by Stephen Tweedie. This file system was available from Linux Kernel 2.4.15. The maximum single file size ranges from 16 GB to 2 TB. The overall file size ranges from 2 TB to 32 TB. Fourth extended file system (EXT 4) introduced in 2008, is available from Linux Kernel 2.6.19. The individual file size can range from 16 GB to 16 TB. The maximum overall EXT4 file system is 1 Exabyte (EB). One Exabyte equals 1024 petabyte (PB) and 1 PB equals 1024 terabyte (TB). EXT 4 has several new features like: multi-block and delayed allocation, journal checksum, etc. In the next slide, we will discuss another file system type, ZFS.

3.31 Z File System

The Z File System (ZFS) is trademarked by Oracle. It was first developed by Sun Microsystems and for data storage, it is designed to use a pooled storage method. The space allotment of the data storage is done directly and intelligently by the file system. It includes protection from data corruption and maintaining the integrity of data. ZFS is also designed to maintain maximum data integrity, support snapshots of data, multiple copies, and data checksums. RAID-Z is a software data replication model used by ZFS. The redundancy provided by RAID-Z is similar to hardware RAID. However, RAID-Z is so designed that it prevents the corruption of data and overcomes a few constraints of hardware RAID.

3.32 Virtual Machine File System

In this slide, we will learn about the last file system type, that is, Virtual Machine File System (VMFS). It is a cluster file system developed by VMware Inc. For multiple installations of VMware ESX Server, VMware VMFS facilitates storage virtualization. Physical servers are partitioned into multiple virtual machines by this hypervisor. With this file system, the oversight of a storage administrator will not be required for the creation of new virtual machines. The network operations will not be hampered while changing the size of a volume. More than one installation of the VMware ESX Server can be utilized simultaneously for data writing and reading to and from an individual storage location. It is possible to not affect the other hosts when VMware ESX servers are added or removed from a VMFS volume. The I/O function of each virtual machine can be optimized by adjusting the file and block sizes. Catastrophic loss of data is prevented and rapid system recovery is allowed by a distributed journaling file system in case of server failure. We will discuss the basic network configuration concepts in the next slide. These concepts will help us understand the implementation of cloud computing better.

3.33 Network Configuration LAN,WAN and MAN

Local Area Network (LAN), Wide Area Network (WAN) and Metropolitan Area Network (MAN) are the original categories of area networks. A LAN connects network devices over a relatively short distance. A building can contain a few small LANs and occasionally a LAN can span a group of nearby buildings. LANs are typically owned, controlled, and managed by a single person or an organization. They primarily use Ethernet and Token Ring. A WAN covers a span of a large physical distance. A network device called a router connects LANs to a WAN. In IP networking, the router maintains both a LAN address and a WAN address. WANs tend to use technologies like ATM, Frame Relay, and X.25. Metropolitan Area Network or MAN is similar to LAN but is reachable within a city. This is possible by using networking devices called bridges or routers. In the next slide we will describe the uses of NAT and PAT.

3.34 Network Configuration Uses of NAT and PAT

The main use of NAT is to limit the number of public IP addresses that an organization or company must use, for both economy and security purposes. Routers within the network recognize that the request is not for a resource inside the network, so they send the request to the firewall. The firewall sees the request from the computer with the internal IP. It then makes the same request to the internet using its own public address, and returns the response from the internet resource to the computer inside the private network. NAT allows private network computers to access public network via a common public IP address. In large networks, some servers are assigned public IP addresses on the firewall, allowing the public to access the servers only through that IP address. NAT allows efficient routing of internal network traffic to the same resources, and allows access to more ports, while restricting access at the firewall. It also allows detailed logging of communications between the network and the outside world. Additionally, NAT can be used to allow selective access outside the network. Workstations or other computers requiring special access outside the network can be assigned specific external IPs using NAT. Similar to NAT, Port Address Translation or PAT allows mapping of private IP addresses to public IP addresses as well as for mapping multiple devices on a network to a single public IP address. We will learn about basic concepts of network configuration, in the next slide.

3.35 Scenario on Understanding Network Configurations

An administrator wants to share the host system’s Internet connection with the virtual machines. Which of the following network connections would you suggest be implemented? PAT, NAT or DHCP In the next slide, we will see what is best to implement in this situation.

3.36 Basic Concepts of Network Configuration

Subnetting is the practice of creating subnetworks, or subnets. A subnet is a logical subdivision of an IP network. Using subnets may be useful in large organizations where it is necessary to allocate address space efficiently. They may also be utilized to increase routing efficiency, and offer improved controls for network management. Inter-subnet traffic is exchanged by routers, just as it would be exchanged between physical networks. Like subnetting, supernetting takes the IP address and breaks it down into a network bit group and a host identifier bit group. A virtual local area network, or VLAN, is the concept of partitioning a physical network to create separate, independent broadcast domains that are part of the same physical network. A trunk link, also known as “trunk,” is a port that transports packets for any VLAN. These trunk ports are usually found in connections between switches, and require the ability to carry packets from all available VLANs. A routing table is a data table stored on a router that the router uses to determine the destination of network packets it is responsible for routing. Switching and routing in physical and virtual environments are managed the same way with a difference that the virtual router in case of virtual environments resides in the hypervisor. Network optimization is the process of efficiently operating a network; and it depends on bandwidth and network latency. Compression is defined as the reduction in the size of data that is traveling across the network, which is achieved by converting that data into a format that requires fewer bits for the same transmission. Caching is the process of storing frequently accessed data in a location closer to the device that is requesting the data. Load balancing is the process of distributing incoming HTTP or application requests evenly across multiple devices or web servers so that no single device is overwhelmed. In the next slide, we will analyze a scenario related to network configuration.

3.37 Scenario on Understanding Network Configurations

An administrator wants to share the host system’s Internet connection with the virtual machines. Which of the following network connections would you suggest be implemented? PAT, NAT or DHCP In the next slide, we will see what is best to implement in this situation. The best solution is to use NAT as it allows a user to share the Internet connection on the host system. In the next slide, let us look into an interaction describing the troubleshooting tools.

3.38 Troubleshooting Tools

The tools for troubleshooting are: PING Tracert Telnet Netstat Nslookup Ipconfig Route ARP Click each tab to learn more.

3.39 Scenario on Troubleshooting Tools

An end user on a VM allotted by the administrator is not able to connect to the remote VM. If the VM contains CentOS, which of the following commands would you suggest the administrator use to check the network adapter status? ipconfig, ifconfig or ping. Let us look at the answer in the following slide.

3.40 Scenario on Troubleshooting Tools(contd)

For this situation, the solution is ifconfig. The command Ifconfig is used by the network administrator to check network adapter status. Let us move on to the standard protocols and port numbers, in the next slide.

3.41 Standard Protocols and Port Numbers

Only a conceptual framework for computer communication is provided by the Open Standard Interconnection (OSI) model and any other models for network communication. In case of communication of data, network protocol can be defined as a formal rule set, data structures, and conventions. In case of networking in computers, the senders and receivers of messages are identified using addressing information of which port numbers form a part. Hyper Text Transfer Protocol (HTTP) is used for handling web request, processing the request using server pages, and providing response to the client system. The default port number for HTTP is 80. File Transfer Protocol (FTP) is used to perform the operations of uploading and downloading the files, to and from the server. The default port number for FTP is 21. Secure Shell (SSH) is used for providing a secure channel to access the system in console mode. The default port number for SSH is 22. Terminal Network (Telnet) is used for accessing the system in console mode; but here, the data transfer from source to destination is not as secure as SSH protocol. The default port number for telnet is 23. Simple Mail Transfer Protocol (SMTP) is used for performing the outgoing operations of an email. The default port number for SMTP is 25. Post Office Protocol (POP) is used to accept the incoming mails sent by the SMTP server. The default port number for POP is 110. We will continue this discussion in the next slide.

3.42 Standard Protocols and Port Numbers(contd)

Remote Desktop Protocol (RDP) is used for accessing the desktop of a remote computer. The default port number for RDP is 3389. Virtual Network Computing (VNC) is used for accessing the remote machine either in a webpage, that is, web console; or in standalone software, that is, software console. The default port number for VNC in web console is 5800 and for software console is 5900. Internet Protocol Security (IPsec) is used in VPN implementations. It is a protocol which provides security addition to the data communication process. The default port number for IPsec is 1293. MySQL is used for communication process with the database. The default port number is 3306. Domain Name Service (DNS) is used for converting the domain names into corresponding IP addresses. The default port number is 53. HTTP secured (HTTPS) is used for ensuring for a secure communication through HTTP using SSH. The default port number is 443. Boot Strap Protocol (BOOTP) is used to ensure and maintain the IP pool for DHCP. The default port number is 68. In the next slide, we will focus on the common network protocols and types of networks.

3.43 Common Network Protocols

Some of the common protocols that are used for network configuration are FTPS, SFTP, DNS and DHCP. Let us understand these protocols in the following slide.

3.44 Common Network Protocols and Types of Networks

When network port configurations are considered, all ports run in default port number till port mapping is performed. In a similar way, protocols can also be mapped. Some of the common protocols that are generally dealt by the cloud service providers are FTPS, SFTP, DNS and DHCP. FTPS is an extension of FTP that allows clients to request their FTP session encryption. SFTP uses SSH to secure the file transfer, FTPS uses SSL or TLS to secure the file transfer. Each domain contains authoritative name servers. The responsibility of assigning domain names and mapping them to the IP addresses is distributed by DNS to these authoritative name servers. Dynamic Host Configuration Protocol (DHCP) is a network protocol that allows a server to automatically assign IP addresses from a predefined range of numbers called scope to computers on a network. It is responsible for assigning IP addresses to computers, and DNS is responsible for resolving those IP addresses to names. In the next slide, we will focus on the common hardware resources and features.

3.45 Common Hardware Resources and Features

The common hardware resources and features used to enable virtual machines are: BIOS configurations Minimum memory capacity and configuration Number of CPUs Number of Cores NICs quantity, speeds, and configurations Internal hardware compatibility Storage media like Tape, SSD, USB and Disk Let us briefly discuss these in the next slide.

3.46 Common Hardware Resources and Features(contd)

The Basic Input/Output System (BIOS) should be compatible to implement virtualization. To check this in the Intel processors, one must go to BIOS setting; search for Intel VT option; and enable it. If there is no existing option, it means that the BIOS system is not compatible for virtualization. Capacity planning is very essential for virtualization. Depending upon the number of users, it is essential to maintain the memory (RAM) accordingly. The minimum configuration for RAM is 8GB. The number of CPUs and cores depend upon the type of operation in the production environment. However, minimum CPU requirement is 2, with a minimum of 8 cores in each CPU. Network Interface Card (NIC) configuration depends on how much speed is required and how much VLAN are present in the production environment. Typically, 1Gbps NIC Card is required. Every hypervisor producer gives a specific list of hardware that is compatible with the virtualization products. Storage media can be: RAID, iSCSI, SAN, tape, SSD, USB, disk etc., depending upon the production environment and the budget. Let us move on to the quiz questions to check your understanding of the concepts covered in this module.

3.48 Summary

Here is a quick recap of what was covered in the module: Networking is the interconnection of computer for communication. The various storage technologies available are NAS, SAN, and DAS. The various access protocols are FCoE, FC, and Ethernet. The different file systems are ZFS, NTFS, FAT, and VMFS. RAID is used to stack multiple disks together. This is to logically form one disk with good storage capabilities.

3.49 Thank You

In the next module, we will discuss network management.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*