Pat Shuff

Subscribe to Pat Shuff feed
Oracle Blogs
Updated: 9 hours 41 min ago

console access for Linux server

Fri, 2016-09-30 09:00
Some times you just have to have access to a console for graphical use to install software. Many software products provide a command line or silent install but for some reason vendors insist on a graphical user interface to install software. With Windows, this is a simple task because getting to the command line is the difficult part. We talked about configuring the Oracle Compute Cloud security rules on June 6th and getting graphical access for a Windows desktop in the cloud. We could follow the same procedure and setup to gain access to a graphical interface to a Linux instance. We would need to either pass the X-Windows interfaces through ssh or open ports up for VNC (virtual network console) or tigerVNC to connect to a software package on the Linux server.

Oracle has a dedicated team that works on the difficult problems like mirroring E-Business Suite or ingesting SalesForce data into BI repositories for analysis. You can find their postings at www.ateam-oracle.com. One of their posts from April 28th was detailing how to setup a VNC server using an ssh tunnel to securely communicate from a Windows desktop using a putty tunnel to a tigerVNC server installed on a Linux instance in the Oracle Compute Cloud. The installation instructions talk about setting up putty on a Windows desktop as well as setting up the X-Windows user interface and tigerVNC on the Linux server.

There are basically two options for this configuration. The first option is to setup the VNC software to run on the Linux server and have it listen on all network connections. If you setup a Corente VPN to connect your office to the Oracle Cloud nothing else needs to be done other than open up port 5901 on your virtual private network. Once this is done you will have access to the console from any computer in your office that is connected to the VPN subnet. You could also open up port 5901 to the public internet but this is not recommended because you are exposing a potential security hole that someone can exploit. It is easy to configure this short term and turn it off when you are done with your testing of the configuration. Alternatively you can setup an ssh tunnel from a computer in your office and anyone who opens up port 5901 to that ip address is tunneled through the ssh connection to port 5901 on the Oracle Cloud. This is a simple and secure way of connecting one desktop to one computer. It also allows a single access point for people to use to connect to a server in the cloud.

One of the key questions that you need to ask when configuring a console connection is how are you going to use it. Do you have a team of administrators that are all going to access a server in the cloud? Do they need simultaneous access? Do you have a VPN need to connect other servers and services or are you doing it just for console connection? There is a cost of $75/month to run a VPN server because you have to pay for a compute instance to run the connection. In my opinion paying $75/month to connect one person to one computer is not worth the cost. You can do the same thing with an SSH tunnel. If, on the other hand, you have 3-4 admins that are trying to access services and developers that are trying to access other servers having them setup a dozen or so SSH tunnels every time that they want to connect is not an efficient use of time.

In summary, many people only look at how to do something. They don't take a step back and look at why they are doing something. If the software that you are installing is only available with a graphical interface installer then you have to set up a connection once possibly twice. If you can run the installer in silent mode with a configuration file to feed it the mouse clicks and text entry, go with the silent mode install. Scripting will allow you to automate the install and put it in the ora-init of your Orchestration thus reducing time and potential error conditions by selecting a different option during re-installation. If you don't have a need for a long term VPN go with a short term SSH tunnel connection. You can always install the VNC software and not enable it on the Linux server in the Oracle Cloud. I did not go through a screen shot of installation and configuration because the A-Team did a great job in their tutorial.

console access for Linux server

Fri, 2016-09-30 09:00
Some times you just have to have access to a console for graphical use to install software. Many software products provide a command line or silent install but for some reason vendors insist on a graphical user interface to install software. With Windows, this is a simple task because getting to the command line is the difficult part. We talked about configuring the Oracle Compute Cloud security rules on June 6th and getting graphical access for a Windows desktop in the cloud. We could follow the same procedure and setup to gain access to a graphical interface to a Linux instance. We would need to either pass the X-Windows interfaces through ssh or open ports up for VNC (virtual network console) or tigerVNC to connect to a software package on the Linux server.

Oracle has a dedicated team that works on the difficult problems like mirroring E-Business Suite or ingesting SalesForce data into BI repositories for analysis. You can find their postings at www.ateam-oracle.com. One of their posts from April 28th was detailing how to setup a VNC server using an ssh tunnel to securely communicate from a Windows desktop using a putty tunnel to a tigerVNC server installed on a Linux instance in the Oracle Compute Cloud. The installation instructions talk about setting up putty on a Windows desktop as well as setting up the X-Windows user interface and tigerVNC on the Linux server.

There are basically two options for this configuration. The first option is to setup the VNC software to run on the Linux server and have it listen on all network connections. If you setup a Corente VPN to connect your office to the Oracle Cloud nothing else needs to be done other than open up port 5901 on your virtual private network. Once this is done you will have access to the console from any computer in your office that is connected to the VPN subnet. You could also open up port 5901 to the public internet but this is not recommended because you are exposing a potential security hole that someone can exploit. It is easy to configure this short term and turn it off when you are done with your testing of the configuration. Alternatively you can setup an ssh tunnel from a computer in your office and anyone who opens up port 5901 to that ip address is tunneled through the ssh connection to port 5901 on the Oracle Cloud. This is a simple and secure way of connecting one desktop to one computer. It also allows a single access point for people to use to connect to a server in the cloud.

One of the key questions that you need to ask when configuring a console connection is how are you going to use it. Do you have a team of administrators that are all going to access a server in the cloud? Do they need simultaneous access? Do you have a VPN need to connect other servers and services or are you doing it just for console connection? There is a cost of $75/month to run a VPN server because you have to pay for a compute instance to run the connection. In my opinion paying $75/month to connect one person to one computer is not worth the cost. You can do the same thing with an SSH tunnel. If, on the other hand, you have 3-4 admins that are trying to access services and developers that are trying to access other servers having them setup a dozen or so SSH tunnels every time that they want to connect is not an efficient use of time.

In summary, many people only look at how to do something. They don't take a step back and look at why they are doing something. If the software that you are installing is only available with a graphical interface installer then you have to set up a connection once possibly twice. If you can run the installer in silent mode with a configuration file to feed it the mouse clicks and text entry, go with the silent mode install. Scripting will allow you to automate the install and put it in the ora-init of your Orchestration thus reducing time and potential error conditions by selecting a different option during re-installation. If you don't have a need for a long term VPN go with a short term SSH tunnel connection. You can always install the VNC software and not enable it on the Linux server in the Oracle Cloud. I did not go through a screen shot of installation and configuration because the A-Team did a great job in their tutorial.

links to Corente tutorials and workshops

Thu, 2016-09-29 09:44
Rather than go through a full install of the Corente VPN configuration I thought I would post references to where I pulled tutorials. The obvious place to start is Corente Documentation. I would not start with this but would start with tutorials and workshops. The documentation is confusing and could lead you to locking your account and getting frustrated (trust me). Yesterday we talked about starting the install process. Normally we would go through a full install but rather than doing that I am going to reverence three documents that I found internal to Oracle.

My favorite of the three is the Workshop because it goes through pictures on how to install and configure with step by step instructions. It starts out by installing Linux in the Oracle Cloud and configuring it to be the App Manager console then gets the Orchestration up and running to start the gateway in the cloud. The single change that I would make with this configuration is to configure the on premise system using a Linux image in VirtualBox locally rather than doing it from Firefox on your desktop. If you follow the same steps that you did to spin up a Linux image and install the packages in the cloud, you can do the same steps in VirtualBox. This deviation starts on page 26 with an install of Linux on a local VirtualBox and goes back to the Workshop without skipping a beat. I was able to follow the 52 page Workshop easier than the 76 page Cookbook. On the flip side, I do like the focus on network configuration in the Cookbook document for the cloud Linux instance.

The net of all this discussion is that there are various ways to configure Corente. It is a time consuming project and you can't just click a button and make it work. If you want to integrate your Cisco of Juniper router in your data center there are online instructions on configuring your router. We will not go through this because we don't have access to hardware to configure and play with.

In summary, this should be the last posting on Corente. It is a very powerful tool that allows you to create a virtual private network between computers in your home, office, or data center and computers in the cloud. It allows you to configure a typical two tier configuration in the Oracle Cloud and hide the database from the public internet while giving your DBAs and developers direct connection to the database. It also allows you to replicate your production systems that are running in your data center and create a high availability site in the Oracle Cloud. This can be done using NFS or SMB file shares and rsync to keep files synchronized or DataGuard to replicate database data between a two servers. Corente VPN allows you to create a trusted and secure communication link with your data center and the Oracle Cloud.

links to Corente tutorials and workshops

Thu, 2016-09-29 09:44
Rather than go through a full install of the Corente VPN configuration I thought I would post references to where I pulled tutorials. The obvious place to start is Corente Documentation. I would not start with this but would start with tutorials and workshops. The documentation is confusing and could lead you to locking your account and getting frustrated (trust me). Yesterday we talked about starting the install process. Normally we would go through a full install but rather than doing that I am going to reverence three documents that I found internal to Oracle.

My favorite of the three is the Workshop because it goes through pictures on how to install and configure with step by step instructions. It starts out by installing Linux in the Oracle Cloud and configuring it to be the App Manager console then gets the Orchestration up and running to start the gateway in the cloud. The single change that I would make with this configuration is to configure the on premise system using a Linux image in VirtualBox locally rather than doing it from Firefox on your desktop. If you follow the same steps that you did to spin up a Linux image and install the packages in the cloud, you can do the same steps in VirtualBox. This deviation starts on page 26 with an install of Linux on a local VirtualBox and goes back to the Workshop without skipping a beat. I was able to follow the 52 page Workshop easier than the 76 page Cookbook. On the flip side, I do like the focus on network configuration in the Cookbook document for the cloud Linux instance.

The net of all this discussion is that there are various ways to configure Corente. It is a time consuming project and you can't just click a button and make it work. If you want to integrate your Cisco of Juniper router in your data center there are online instructions on configuring your router. We will not go through this because we don't have access to hardware to configure and play with.

In summary, this should be the last posting on Corente. It is a very powerful tool that allows you to create a virtual private network between computers in your home, office, or data center and computers in the cloud. It allows you to configure a typical two tier configuration in the Oracle Cloud and hide the database from the public internet while giving your DBAs and developers direct connection to the database. It also allows you to replicate your production systems that are running in your data center and create a high availability site in the Oracle Cloud. This can be done using NFS or SMB file shares and rsync to keep files synchronized or DataGuard to replicate database data between a two servers. Corente VPN allows you to create a trusted and secure communication link with your data center and the Oracle Cloud.

Corente on VirtualBox revisited

Wed, 2016-09-28 16:20
Last week we started talking about setting up Corente and came to the conclusion that you can not run Corente Gateway in a VirtualBox. It turns out that not only was I wrong but I got a ton of mail from people like product managers, people who got it working, and people who generally did not agree with my conclusion. Ok, I will admit that I read the manuals, played with the suggested configurations, and tried deploying it on my own. It appears that I did a few things backwards and cornered myself into an area that caused things not to work. Today we are going to walk through the steps needed to get Corente up and running in your data center using VirtualBox as a sandbox.

The first thing that you absolutely need is a Corente admin account. Without this you will not be able to create a configuration to download and everything will fail. You should have received an account email from "no-reply-cloud@oracle.com" with the title "A VPN account was created for you". If you have multiple accounts you should have received multiple emails. This is a good thing if you got multiples. It is a bad thing if you did not get any. I received mine back on August 11th of this year. I received similar emails back on April 27th for some paid accounts that I have had for a while. The email readsThe VPN account information included in this email enables you to sign in to App Net Manager Service Portal when setting up Corente Services Gateway (cloud gateway) on Oracle Cloud, which is Step 2 of the setup process.Account Details

Username: a59878_adminPassword: --not shown--Corente Domain: a59878Click here for additional details about how to access your account. The link takes you to the documentation on how to setup a service gateway. The document was last updated in August and goes through the workflow on how to setup a connection.

Step 1: Obtain a trial or paid subscription to Oracle Compute Cloud Service. After you subscribe to Oracle Compute Cloud Service, you will get your Corente credentials through email after you receive the Oracle Compute Cloud Service welcome email.

Step 2: Set up a Corente Services Gateway (on-premises gateway) in your data center. This is where everything went off the rails the first time. This actually is not step 2. Step 2 is to visit the App Net Manager and register your gateway using the credentials that you received in the email. I went down the foolish path of spinning up a Linux 6 instance and running the verification to make sure that the virtualization gets passed to the guest operating system. According to the documentation, this is step 2. VirtualBox fails all of the tests suggested. I then looked for a second way of running in VirtualBox and the old way, CSG-VE, is not intended to be used with the Oracle Cloud. The CSG-VE is different from the gateway deployment and is for legacy Corente customers. It was never intended to be a solution for the Oracle Cloud. If you follow the cookbooks that are available internal to Oracle you can make the Corente Service Gateway work properly. I found two cookbooks and both are too large to publish in this blog. I will try to summarize the key steps. Ask your local sales consultant to look for "Oracle Corente Cloud Services Cook Book" or "Oracle Cloud Platform - Corente VPN for PaaS and IaaS". Both walk you through installation with screen shots and recommended configurations.

Step 2a: Go to www.corente.com/web and execute the Java code that launches the App Net Manager. When I first did this it failed. I had to download a newer version of Java to get the javaws image to install. If you are on a Linux desktop you can do this with a w get http://javadl.oracle.com/webapps/download/AutoDL?BundleId=211989 or go to the web page https://java.com/en/download/linux_manual.jsp and download the Linux64 bundle. This allows you to uncompress and install the javaws binary and associate it with the jsp file provided on the Corente site. If you are on Windows or MacOS, go to https://java.com/en/download/ and it will figure out what your desktop is and ask you to download and install the latest version of Java. What you are looking for is a version with a JDK containing the javaws binary. This binary is called from the web browser and executes the downloadable scripts from the Corente site.

Step 2b: When you go to the www.corente.com/web site it will download java code and launch the App Manager. It should look like

The first time there will be no locations listed. We will need to add a location. It is important to note that the physical address that you use for the location has no relevance to the actual address of your server, gateway, or cloud hosting service. I have been cycling through major league baseball park addresses as my location. My gateway is currently located at Minute Maid Park in Houston and my desktop is at the Texas Rangers Ballpark in Arlington with my server at Wrigley Field in Chicago.

Step 2c: Launch the New Location Wizard. The information that will be needed is Name, address, maintenance window (date and reboot option), inline configuration, dhcp, dhcp client name is optional, and lan interface. Note that it is important to know ahead of time what your lan interface is going to be. Once you get your gateway configured and connected the only way to get back into this console is to do it from this network. When I first did this I did not write down the ip address and basically locked my account. I had to go to another account domain and retry the configuration. For the trial that I did I used 192.168.200.1 as the lan address and had it use 255.255.255.0 as the netmask. This will become your gateway for all subnets in your data center. By default there is a dhcp server in my house that assigns IP addresses to the 192.168.1.X network. You need to pick something different than this subnet because you can't have a broadband router acting as a gateway to the internet and a VPN router acting as a gateway router on the same subnet. The implication to this is that you will need to create a new network interface on your Oracle Compute Cloud instances that have a network connection that talk on the 192.168.200.X network. This is easy to do but selection of this network is important and writing it down is even more important. The wizard will continue and ask about adding the subnet to the Default User Group. Click Yes and add the 192.168.200.X subnet to this group.

Step 2d: At this point we are ready to install a Linux 6 or Linux 7 guest OS in VirtualBox and download the Corente Services Gateway software from http://www.oracle.com/technetwork/topics/cloud/downloads/network-cloud-service-2952583.html. From here you agree to the legal stuff and download the Corente Gateway Image. This is a bootable image that works with VirtualBox.

Step 2e: We need to configure the instance with 2G of RAM, at least 44G of disk, and two network interfaces. The first interface needs to be configured as active using the Bridged Adapter. The second interface needs to be configured as active using the Internal Network. The bridged adapter is going to get assigned to the 192.168.1.X network by our home broadband DHCP server. The second network is going to be statically mapped to 192.168.200.1 by the configuration that you download from the App Manager. You also need to mount the iso image that was downloaded for the Corente Gateway Image. When the server boots it will load the operating system into the virtual disk and ask to reboot once the OS is loaded.

Step 3: Rather than rebooting the instance we should stop the reboot after shutdown happens and remove the iso as the default boot device. If we don't, we will go through the OS install again and it will keep looping until we do. Once we boot the OS it will ask us to download the configuration file from the App Manager. We do this by setting the download site to www.corente.com, selecting dhcp as the network configuration and entering our login information for the App Manager in the next screen.

Step 4: At this point we have a gateway configured in our data center (or home in my case) and need to setup a desktop server to connect through the VPN and access the App Manager. Up to this point we have connected to the app manager via our desktop to setup the initial configuration. From this point forward we will need to do so from an ip address in the 192.168.200.x network. If you try to connect to the app manager from your desktop you will get an error message and nothing can be done. To install a guest system we boot Linux 6 or Linux 7 into VirtualBox and connect to https://66.77.134.249. To do this we need to setup the network interfaces on our guest operating system. The network needs to be the internal network. For my example I used 192.168.200.100 as the guest OS ip address and the default router is 192.168.200.1 which is our gateway server. This machine is configured with a static IP address because by default the 192.168.1.X server will answer the DHCP address and assign you to the wrong subnet. To get the App Manager to work I had to download the javaws again for Linux and associate the jsp file from the www.corente.com/web site to launch using javaws. Once this was done I was able to add the guest OS as a new location.

At this point we have a gateway server configured and running and a computer inside our private subnet that can access the App Manager. This is the foundation to getting everything to work. From here you can then provision a gateway instance in the cloud service and connect your guest OS to computers in the cloud as if they were in the same data center. More on that later.

In summary, this was more difficult to do than I was hoping for. I made a few key mistakes when configuring the service. The first was not recording the IP address when I setup everything the first time. The second was using the default network behind my broadband router and not a different network address. The third was assuming that the steps presented in the documentation were the steps that I had to follow. The fourth was not knowing that I had to setup a guest OS to access the App Manager once I had the gateway configured. Each of these mistakes took hours to overcome. Each configuration and failure required starting over again from scratch and once I got to a point in the install I could not go back to scratch but had to start over with another account to get back to scratch. I am still trying to figure out how to reset the configuration for my initial account. Hopefully my slings and arrows will help you avoid the pitfalls of outrageous installations.

Corente on VirtualBox revisited

Wed, 2016-09-28 16:20
Last week we started talking about setting up Corente and came to the conclusion that you can not run Corente Gateway in a VirtualBox. It turns out that not only was I wrong but I got a ton of mail from people like product managers, people who got it working, and people who generally did not agree with my conclusion. Ok, I will admit that I read the manuals, played with the suggested configurations, and tried deploying it on my own. It appears that I did a few things backwards and cornered myself into an area that caused things not to work. Today we are going to walk through the steps needed to get Corente up and running in your data center using VirtualBox as a sandbox.

The first thing that you absolutely need is a Corente admin account. Without this you will not be able to create a configuration to download and everything will fail. You should have received an account email from "no-reply-cloud@oracle.com" with the title "A VPN account was created for you". If you have multiple accounts you should have received multiple emails. This is a good thing if you got multiples. It is a bad thing if you did not get any. I received mine back on August 11th of this year. I received similar emails back on April 27th for some paid accounts that I have had for a while. The email reads

The VPN account information included in this email enables you to sign in to App Net Manager Service Portal when setting up Corente Services Gateway (cloud gateway) on Oracle Cloud, which is Step 2 of the setup process.
Account Details
	
Username:  a59878_admin
Password: --not shown--
Corente Domain:  a59878
Click here for additional details about how to access your account. The link takes you to the documentation on how to setup a service gateway. The document was last updated in August and goes through the workflow on how to setup a connection.

Step 1: Obtain a trial or paid subscription to Oracle Compute Cloud Service. After you subscribe to Oracle Compute Cloud Service, you will get your Corente credentials through email after you receive the Oracle Compute Cloud Service welcome email.

Step 2: Set up a Corente Services Gateway (on-premises gateway) in your data center. This is where everything went off the rails the first time. This actually is not step 2. Step 2 is to visit the App Net Manager and register your gateway using the credentials that you received in the email. I went down the foolish path of spinning up a Linux 6 instance and running the verification to make sure that the virtualization gets passed to the guest operating system. According to the documentation, this is step 2. VirtualBox fails all of the tests suggested. I then looked for a second way of running in VirtualBox and the old way of doing this is being dropped from support. According to the product manager, support is being dropped because it does work in VirtualBox and if you follow the cookbooks that are available internal to Oracle you can make it work properly. I found two cookbooks and both are too large to publish in this blog. I will try to summarize the key steps. Ask your local sales consultant to look for "Oracle Corente Cloud Services Cook Book" or "Oracle Cloud Platform - Corente VPN for PaaS and IaaS". Both walk you through installation with screen shots and recommended configurations.

Step 2a: Go to www.corente.com/web and execute the Java code that launches the App Net Manager. When I first did this it failed. I had to download a newer version of Java to get the javaws image to install. If you are on a Linux desktop you can do this with a w get http://javadl.oracle.com/webapps/download/AutoDL?BundleId=211989 or go to the web page https://java.com/en/download/linux_manual.jsp and download the Linux64 bundle. This allows you to uncompress and install the javaws binary and associate it with the jsp file provided on the Corente site. If you are on Windows or MacOS, go to https://java.com/en/download/ and it will figure out what your desktop is and ask you to download and install the latest version of Java. What you are looking for is a version with a JDK containing the javaws binary. This binary is called from the web browser and executes the downloadable scripts from the Corente site.

Step 2b: When you go to the www.corente.com/web site it will download java code and launch the App Manager. It should look like

The first time there will be no locations listed. We will need to add a location. It is important to note that the physical address that you use for the location has no relevance to the actual address of your server, gateway, or cloud hosting service. I have been cycling through major league baseball park addresses as my location. My gateway is currently located at Minute Maid Park in Houston and my desktop is at the Texas Rangers Ballpark in Arlington with my server at Wrigley Field in Chicago.

Step 2c: Launch the New Location Wizard. The information that will be needed is Name, address, maintenance window (date and reboot option), inline configuration, dhcp, dhcp client name is optional, and lan interface. Note that it is important to know ahead of time what your lan interface is going to be. Once you get your gateway configured and connected the only way to get back into this console is to do it from this network. When I first did this I did not write down the ip address and basically locked my account. I had to go to another account domain and retry the configuration. For the trial that I did I used 192.168.200.1 as the lan address and had it use 255.255.255.0 as the netmask. This will become your gateway for all subnets in your data center. By default there is a dhcp server in my house that assigns IP addresses to the 192.168.1.X network. You need to pick something different than this subnet because you can't have a broadband router acting as a gateway to the internet and a VPN router acting as a gateway router on the same subnet. The implication to this is that you will need to create a new network interface on your Oracle Compute Cloud instances that have a network connection that talk on the 192.168.200.X network. This is easy to do but selection of this network is important and writing it down is even more important. The wizard will continue and ask about adding the subnet to the Default User Group. Click Yes and add the 192.168.200.X subnet to this group.

Step 2d: At this point we are ready to install a Linux 6 or Linux 7 guest OS in VirtualBox and download the Corente Services Gateway software from http://www.oracle.com/technetwork/topics/cloud/downloads/network-cloud-service-2952583.html. From here you agree to the legal stuff and download the Corente Gateway Image. This is a bootable image that works with VirtualBox.

Step 2e: We need to configure the instance with 2G of RAM, at least 44G of disk, and two network interfaces. The first interface needs to be configured as active using the Bridged Adapter. The second interface needs to be configured as active using the Internal Network. The bridged adapter is going to get assigned to the 192.168.1.X network by our home broadband DHCP server. The second network is going to be statically mapped to 192.168.200.1 by the configuration that you download from the App Manager. You also need to mount the iso image that was downloaded for the Corente Gateway Image. When the server boots it will load the operating system into the virtual disk and ask to reboot once the OS is loaded.

Step 3: Rather than rebooting the instance we should stop the reboot after shutdown happens and remove the iso as the default boot device. If we don't, we will go through the OS install again and it will keep looping until we do. Once we boot the OS it will ask us to download the configuration file from the App Manager. We do this by setting the download site to www.corente.com, selecting dhcp as the network configuration and entering our login information for the App Manager in the next screen.

Step 4: At this point we have a gateway configured in our data center (or home in my case) and need to setup a desktop server to connect through the VPN and access the App Manager. Up to this point we have connected to the app manager via our desktop to setup the initial configuration. From this point forward we will need to do so from an ip address in the 192.168.200.x network. If you try to connect to the app manager from your desktop you will get an error message and nothing can be done. To install a guest system we boot Linux 6 or Linux 7 into VirtualBox and connect to https://66.77.134.249. To do this we need to setup the network interfaces on our guest operating system. The network needs to be the internal network. For my example I used 192.168.200.100 as the guest OS ip address and the default router is 192.168.200.1 which is our gateway server. This machine is configured with a static IP address because by default the 192.168.1.X server will answer the DHCP address and assign you to the wrong subnet. To get the App Manager to work I had to download the javaws again for Linux and associate the jsp file from the www.corente.com/web site to launch using javaws. Once this was done I was able to add the guest OS as a new location.

At this point we have a gateway server configured and running and a computer inside our private subnet that can access the App Manager. This is the foundation to getting everything to work. From here you can then provision a gateway instance in the cloud service and connect your guest OS to computers in the cloud as if they were in the same data center. More on that later.

In summary, this was more difficult to do than I was hoping for. I made a few key mistakes when configuring the service. The first was not recording the IP address when I setup everything the first time. The second was using the default network behind my broadband router and not a different network address. The third was assuming that the steps presented in the documentation were the steps that I had to follow. The fourth was not knowing that I had to setup a guest OS to access the App Manager once I had the gateway configured. Each of these mistakes took hours to overcome. Each configuration and failure required starting over again from scratch and once I got to a point in the install I could not go back to scratch but had to start over with another account to get back to scratch. I am still trying to figure out how to reset the configuration for my initial account. Hopefully my slings and arrows will help you avoid the pitfalls of outrageous installations.

Making Hadoop easier

Mon, 2016-09-26 02:07
Last week we looked at provisioning a Hadoop server and realized that the setup was a little complex and somewhat difficult. This is what people typically do the first time when they want to provision a service. They download the binaries (or source if you are really crazy) and install everything from scratch. Our recommendation is to do everything this way the first time. It does help you get a better understanding of how the setup works and dependencies. For example, Hadoop 2.7.3 required Java 1.8 or greater. If we go with Hadoop 2.7.2 we can get by with Java 1.7.

Rather than going through all of the relationships, requirements, and libraries needed to get something working we are going to do what we would typically do to spin up a server if we suddenly need one up and running. We go to a service that provides pre-compiled and pre-configured public domain code sandboxes and get everything running that way. The service of choice for the Oracle Compute Cloud is Bitnami We can search for a Hadoop configuration and provision it into our IaaS foundation. Note that we could do the same using the Amazon EMR and get the same results. The key difference between the two are configurations, number of servers, and cost. We are going to go through the Bitnami deployment on the Oracle Cloud in this blog.

Step 1 Search for Hadoop on http://oracle.bitnami.com and launch the instance into your region of choice.

Step 2 Configure and launch the instance. We give the instance a name, we increase the default disk size from 10 GB to 60 GB to have room for data, we go with the hadoop 2.7.2-1 version, select Oracle Linux 6.7 as the OS (Ubuntu is an alternative), and go with a small OC3 footprint for the compute size. Don't change the security rules. A new one will be generated for you as well as the ssh keys when you provision through this service.

Step 3 Log into your instance. To do this you will need ssh and use the keys that bitnami generates for you. The instance creation takes 10-15 minutes and should show you a screen with the ip address and have links for you to download the keys.

Step 4 Once you have access to the master system you can execute the commands that we did last week. The only key difference with this implementation is that you will need to install java-1.8 with a yum install because by default the development kit is not installed and we need the jar functionality as part of configuration. The steps needed to repeat our tests from the previous blog entry. --- setup hdfs file system hdfs namenode -format hdfs getconf -namenodes hdfs dfs -mkdir input cp /opt/bitnami/hadoop/etc/hadoop/*.xml input hdfs dfs -put input/*.xml input --- setup simple test with wordcount hdfs dfs -mkdir wordcount hdfs dfs -mkdir wordcount/input mkdir ~/wordcount mkdir ~/wordcount/input vi file01 mv file01 ~/wordcount/input vi ~/wordcount/input/file02 hdfs dfs -put ~/wordcount/input/* wordcount/input vi WordCount.java --- install java-1.8 to get all of the libraries sudo yum install java-1.8\* --- create ec.jar file export HADOOP_CLASSPATH=/opt/bitnami/java/lib/tools.jar hadoop com.sun.tools.javac.Main WordCount.java jar cf wc.jar WordCount*.class hadoop jar wc.jar WordCount wordcount/input wordcount/output hadoop fs -cat wordcount/output/part-r-00000 --- download data and test pig mkdir data cd data w get http://stat-computing.org/dataexpo/2009/1987.csv.bz2 w get http://stat-computing.org/dataexpo/2009/1988.csv.bz2 bzip2 -d 1987.csv.bz2 bzip2 -d 1988.csv.bz2 hdfs dfs -mkdir airline hdfs dfs -copyFromLocal 19*.csv airline vi totalmiles.pig pig totalmiles.pig hdfs dfs -cat data/totalmiles/part-r-00000

Note that we can do the exact same thing using Amazon AWS. They have a MapReduce product called EMR. If you go to the main console, click on EMR at the bottom of the screen, you can create a Hadoop cluster. Once you get everything created and can ssh into the master you can repeat the steps above.

I had a little trouble with the WordCount.java program in that the library version was a little different. The JVM_1.7 libraries had a problem linking and adding the JVM_1.8 binaries did not properly work with the Hadoop binaries. You also need to change the HADOOP_CLASSPATH to point to the proper tools.jar file since it is in a different location from the Bitnami install. I think with a little tweaking it would all work. The pig sample code works with no problem so we were able to test that without changing anything.

In summary, provisioning a Hadoop server or cluster in the cloud is very easy if someone else has done the heavy lifting and pre-configured a server or group of servers for you. I was able to provision two clusters before lunch, run through the exercises, and still have time to go through it again to verify. Using a service like private Marketplaces, Bitnami, or the AWS Marketplace makes it much simpler to deploy sandbox images.

Making Hadoop easier

Mon, 2016-09-26 02:07
Last week we looked at provisioning a Hadoop server and realized that the setup was a little complex and somewhat difficult. This is what people typically do the first time when they want to provision a service. They download the binaries (or source if you are really crazy) and install everything from scratch. Our recommendation is to do everything this way the first time. It does help you get a better understanding of how the setup works and dependencies. For example, Hadoop 2.7.3 required Java 1.8 or greater. If we go with Hadoop 2.7.2 we can get by with Java 1.7.

Rather than going through all of the relationships, requirements, and libraries needed to get something working we are going to do what we would typically do to spin up a server if we suddenly need one up and running. We go to a service that provides pre-compiled and pre-configured public domain code sandboxes and get everything running that way. The service of choice for the Oracle Compute Cloud is Bitnami We can search for a Hadoop configuration and provision it into our IaaS foundation. Note that we could do the same using the Amazon EMR and get the same results. The key difference between the two are configurations, number of servers, and cost. We are going to go through the Bitnami deployment on the Oracle Cloud in this blog.

Step 1 Search for Hadoop on http://oracle.bitnami.com and launch the instance into your region of choice.

Step 2 Configure and launch the instance. We give the instance a name, we increase the default disk size from 10 GB to 60 GB to have room for data, we go with the hadoop 2.7.2-1 version, select Oracle Linux 6.7 as the OS (Ubuntu is an alternative), and go with a small OC3 footprint for the compute size. Don't change the security rules. A new one will be generated for you as well as the ssh keys when you provision through this service.

Step 3 Log into your instance. To do this you will need ssh and use the keys that bitnami generates for you. The instance creation takes 10-15 minutes and should show you a screen with the ip address and have links for you to download the keys.

Step 4 Once you have access to the master system you can execute the commands that we did last week. The only key difference with this implementation is that you will need to install java-1.8 with a yum install because by default the development kit is not installed and we need the jar functionality as part of configuration. The steps needed to repeat our tests from the previous blog entry.

 --- setup hdfs file system 
   hdfs namenode -format
   hdfs getconf -namenodes
   hdfs dfs -mkdir input
   cp /opt/bitnami/hadoop/etc/hadoop/*.xml input
   hdfs dfs -put input/*.xml input
 --- setup simple test with wordcount
   hdfs dfs -mkdir wordcount
   hdfs dfs -mkdir wordcount/input
   mkdir ~/wordcount
   mkdir ~/wordcount/input
   vi file01
   mv file01 ~/wordcount/input
   vi ~/wordcount/input/file02
   hdfs dfs -put ~/wordcount/input/* wordcount/input
   vi WordCount.java
 --- install java-1.8 to get all of the libraries
   sudo yum install java-1.8\*
 --- create ec.jar file
   export HADOOP_CLASSPATH=/opt/bitnami/java/lib/tools.jar
   hadoop com.sun.tools.javac.Main WordCount.java
   jar cf wc.jar WordCount*.class
   hadoop jar wc.jar WordCount wordcount/input wordcount/output
   hadoop fs -cat wordcount/output/part-r-00000
 --- download data and test pig
   mkdir data
   cd data
   w get http://stat-computing.org/dataexpo/2009/1987.csv.bz2
   w get http://stat-computing.org/dataexpo/2009/1988.csv.bz2
   bzip2 -d 1987.csv.bz2
   bzip2 -d 1988.csv.bz2
   hdfs dfs -mkdir airline
   hdfs dfs -copyFromLocal 19*.csv airline
   vi totalmiles.pig
   pig totalmiles.pig
   hdfs dfs -cat data/totalmiles/part-r-00000

Note that we can do the exact same thing using Amazon AWS. They have a MapReduce product called EMR. If you go to the main console, click on EMR at the bottom of the screen, you can create a Hadoop cluster. Once you get everything created and can ssh into the master you can repeat the steps above.

I had a little trouble with the WordCount.java program in that the library version was a little different. The JVM_1.7 libraries had a problem linking and adding the JVM_1.8 binaries did not properly work with the Hadoop binaries. You also need to change the HADOOP_CLASSPATH to point to the proper tools.jar file since it is in a different location from the Bitnami install. I think with a little tweaking it would all work. The pig sample code works with no problem so we were able to test that without changing anything.

In summary, provisioning a Hadoop server or cluster in the cloud is very easy if someone else has done the heavy lifting and pre-configured a server or group of servers for you. I was able to provision two clusters before lunch, run through the exercises, and still have time to go through it again to verify. Using a service like private Marketplaces, Bitnami, or the AWS Marketplace makes it much simpler to deploy sandbox images.

Hadoop on IaaS - part 2

Fri, 2016-09-23 02:07
Today we are going to get our hands dirty and install a single instance standalong Hadoop Cluster on the Oracle Compute Cloud. This is a continuing series of installing public domain software on Oracle Cloud IaaS. We are going to base our installation on three componentsWe are using Oracle Linux 6.7 because it is the easiest to install on Oracle Compute Cloud Services. We could have done Ubuntu or SUSE or Fedora and followed some of the tutorials from HortonWorks or Cloudera or Apache Single Node Cluster. Instead we are going old school and installing from the Hadoop home page by downloading a tar ball and configuring the operating system to run a single node cluster.

Step 1:

Install Oracle Linux 6.7 on an Oracle Compute Cloud instance. Note that you can do the same thing by installing on your favorite virtualization engine like VirtualBox, VMWare, HyperV, or any other cloud vendor. The only true dependency is the operating system beyond this point. If you are installing on the Oracle Cloud, go with the OL_67_3GB..... option, go with the smallest instance, delete the boot disk, replace it with a 60 GB disk, rename it and launch. The key reason that we need to delete the boot disk is that by default the 3 GB disk will not take the Hadoop binary. We need to grow it to at least 40 GB. We pad a little bit with a 60 GB disk. If you check the new disk as a boot disk it replaces the default Root disk and allows you to create an instance with a 60 GB disk.

Step 2:

Run yum to update the os, install w get, and java version 1.8. You need to login as opc to the instance so that you can run as root.

Note that we are going to diverge from the Hadoop for Dummies that we referenced yesterday. They suggest attaching to a yum repository and doing an install from the repository for the bigtop package. We don't have that option for Oracle Linux and need to do the install from the binaries by downloading a tar or src image. The bigtop package basically takes the Apache Hadoop bundle and translates them to rpm files for an operating system. Oracle does not provide this as part of the yum repository and Apache does not create one for Oracle Linux or RedHat. We are going to download the tar file from the links provided at Apache Hadoop homepage we are following install instructions for a single node cluster.

Step 3:

Get the tar.gz file by pulling it from http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Step 4:We unpack the tar.gz file with the tar xvzf hadoop-2.7.2.tar.gz command

Step 5:

Next we add the following to the .bashrc file in the home directory to setup some environment variables. The java code is done in the same location by the yum command. The location of the hadoop code is based on downloading into the opc home directory.

export JAVA_HOME=/usrexport HADOOP_HOME=/home/opc/hadoop-2.7.3export HADOOP_CONFIG_DIR=/home/opc/hadoop-2.7.3/etc/hadoopexport HADOOP_MAPRED_HOME=/home/opc/hadoop-2.7.3export HADOOP_COMMON_HOME=/home/opc/hadoop-2.7.3export HADOOP_HDFS_HOME=/home/opc/hadoop-2.7.3export YARN_HOME=/home/opc/hadoop-2.7.3export PATH=$PATH:$HADOOP_HOME/bin

Step 6

Source the .bashrc to pull in these environment variables

Step 7Edit the /etc/hosts file to add namenode to the file.

Step 8

Setup ssh so that we can loop back to localhost and launch an agent. I had to edit the authorized_keys to add a return before the new entry. If you don't the ssh won't work. ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysvi ~/.ssh/authorized_keysssh localhostexit

Step 9Test the configuration then configure the hadoop file system for single node.cd $HADOOP_HOMEmkdir inputcp etc/hadoop/*.xml input./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'vi etc/hadoop/core-site.xml

When we ran this and there were a couple of warnings which we can ignore. The test should finish without error and generate a long output list. We then edit to core-site.xml file by changing the following lines at the end. (omit the spaces, the blog software masked them and the only way to show the full file was to add spaces)< configuration >< property >< name >fs.defaultFS< /name >< value >hdfs://namenode:8020< /value >< /property >< /configuration >

Step 10

Create the hadoop file system with the command hdfs namenode -format

Step 11

Verify the configuration with the command hdfs getconf -namenodes

Step 12

Start the hadoop file system with the command sbin/start-dfs.sh

At this point we have the hadoop filesystem up and running. We now need to configure MapReduce and test functionality.Step 13

Make the HDFS directories required to execute MapReduce jobs with the commands hdfs dfs -mkdir /user hdfs dfs -mkdir /user/opc hdfs dfs -mkdir input hdfs dfs -put etc/hadoop/*.xml input

Step 14Run a MapReduce example and look at the output hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' hdfs dfs -get output output cat output/* output/output/*

Step 15

Create a test program to do a wordcount of two files. This example comes from an Apache MapReduce Tutorial

hdfs dfs -mkdir wordcounthdfs dfs -mkdir wordcount/inputmkdir ~/wordcountmkdir ~/wordcount/inputvi ~/wordcount/input/file01 - add Hello World Bye Worldvi ~/wordcount/input/file02- addHello Hadoop Goodbye Hadoophdfs dfs -put ~/wordcount/input/* wordcount/inputvi ~/wordcount/WordCount.java

Create WordCount.java with the following code

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

Step 16

Compile and run the WordCount.java code

cd ~/wordcountexport JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64export HADOOP_CLASSPATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/lib/tools.jarhadoop com.sun.tools.javac.Main WordCount.javajar cf wc.jar WordCount*.classhadoop jar wc.jar WordCount wordcount/input wordcount/outputhadoop fs -cat wordcount/output/part-r-00000

At this point we have a working system and can run more MapReduce jobs, look at results, and play around with Big Data foundations.

In summary, this is a relatively complex example. We have moved beyond a simple install of an Apache web server or Tomcat server and editing some files to get results. We have the foundations for a Big Data analytics solution running on the Oracle Compute Cloud Service. The steps to install are very similar to the other installation tutorials that we referenced earlier on Amazon and Virtual Machines. Oracle Compute is a good foundation for public domain code. Per core the processes are cheaper than other cloud vendors. Networking is non-blocking and higher performance. Storage throughput is faster and optimized for compute high I/O and tied to the compute engine. Hopefully this tutorial has given you the foundation to start playing with Hadoop on Oracle IaaS.

Hadoop on IaaS - part 2

Fri, 2016-09-23 02:07
Today we are going to get our hands dirty and install a single instance standalong Hadoop Cluster on the Oracle Compute Cloud. This is a continuing series of installing public domain software on Oracle Cloud IaaS. We are going to base our installation on three components We are using Oracle Linux 6.7 because it is the easiest to install on Oracle Compute Cloud Services. We could have done Ubuntu or SUSE or Fedora and followed some of the tutorials from HortonWorks or Cloudera or Apache Single Node Cluster. Instead we are going old school and installing from the Hadoop home page by downloading a tar ball and configuring the operating system to run a single node cluster.

Step 1:

Install Oracle Linux 6.7 on an Oracle Compute Cloud instance. Note that you can do the same thing by installing on your favorite virtualization engine like VirtualBox, VMWare, HyperV, or any other cloud vendor. The only true dependency is the operating system beyond this point. If you are installing on the Oracle Cloud, go with the OL_67_3GB..... option, go with the smallest instance, delete the boot disk, replace it with a 60 GB disk, rename it and launch. The key reason that we need to delete the boot disk is that by default the 3 GB disk will not take the Hadoop binary. We need to grow it to at least 40 GB. We pad a little bit with a 60 GB disk. If you check the new disk as a boot disk it replaces the default Root disk and allows you to create an instance with a 60 GB disk.

Step 2:

Run yum to update the os, install w get, and java version 1.8. You need to login as opc to the instance so that you can run as root.

Note that we are going to diverge from the Hadoop for Dummies that we referenced yesterday. They suggest attaching to a yum repository and doing an install from the repository for the bigtop package. We don't have that option for Oracle Linux and need to do the install from the binaries by downloading a tar or src image. The bigtop package basically takes the Apache Hadoop bundle and translates them to rpm files for an operating system. Oracle does not provide this as part of the yum repository and Apache does not create one for Oracle Linux or RedHat. We are going to download the tar file from the links provided at Apache Hadoop homepage we are following install instructions for a single node cluster.

Step 3:

Get the tar.gz file by pulling it from http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Step 4: We unpack the tar.gz file with the tar xvzf hadoop-2.7.2.tar.gz command

Step 5:

Next we add the following to the .bashrc file in the home directory to setup some environment variables. The java code is done in the same location by the yum command. The location of the hadoop code is based on downloading into the opc home directory.

export JAVA_HOME=/usr
export HADOOP_HOME=/home/opc/hadoop-2.7.3
export HADOOP_CONFIG_DIR=/home/opc/hadoop-2.7.3/etc/hadoop
export HADOOP_MAPRED_HOME=/home/opc/hadoop-2.7.3
export HADOOP_COMMON_HOME=/home/opc/hadoop-2.7.3
export HADOOP_HDFS_HOME=/home/opc/hadoop-2.7.3
export YARN_HOME=/home/opc/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin

Step 6

Source the .bashrc to pull in these environment variables

Step 7 Edit the /etc/hosts file to add namenode to the file.

Step 8

Setup ssh so that we can loop back to localhost and launch an agent. I had to edit the authorized_keys to add a return before the new entry. If you don't the ssh won't work.

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
vi ~/.ssh/authorized_keys
ssh localhost
exit

Step 9 Test the configuration then configure the hadoop file system for single node.

cd $HADOOP_HOME
mkdir input
cp etc/hadoop/*.xml input
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
vi etc/hadoop/core-site.xml

When we ran this and there were a couple of warnings which we can ignore. The test should finish without error and generate a long output list. We then edit to core-site.xml file by changing the following lines at the end. (omit the spaces, the blog software masked them and the only way to show the full file was to add spaces)

< configuration >
 < property >
  < name >fs.defaultFS< /name >
  < value >hdfs://namenode:8020< /value >
 < /property >
< /configuration >

Step 10

Create the hadoop file system with the command hdfs namenode -format

Step 11

Verify the configuration with the command hdfs getconf -namenodes

Step 12

Start the hadoop file system with the command sbin/start-dfs.sh

At this point we have the hadoop filesystem up and running. We now need to configure MapReduce and test functionality. Step 13

Make the HDFS directories required to execute MapReduce jobs with the commands

  hdfs dfs -mkdir /user
  hdfs dfs -mkdir /user/opc
  hdfs dfs -mkdir input
  hdfs dfs -put etc/hadoop/*.xml input

Step 14 Run a MapReduce example and look at the output

  hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep 
    input output 'dfs[a-z.]+'
  hdfs dfs -get output output
  cat output/* output/output/*

Step 15

Create a test program to do a wordcount of two files. This example comes from an Apache MapReduce Tutorial

hdfs dfs -mkdir wordcount
hdfs dfs -mkdir wordcount/input
mkdir ~/wordcount
mkdir ~/wordcount/input
vi ~/wordcount/input/file01
 - add 
Hello World Bye World
vi ~/wordcount/input/file02
- add
Hello Hadoop Goodbye Hadoop
hdfs dfs -put ~/wordcount/input/* wordcount/input
vi ~/wordcount/WordCount.java

Create WordCount.java with the following code

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Step 16

Compile and run the WordCount.java code

cd ~/wordcount
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64
export HADOOP_CLASSPATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/lib/tools.jar
hadoop com.sun.tools.javac.Main WordCount.java
jar cf wc.jar WordCount*.class
hadoop jar wc.jar WordCount wordcount/input wordcount/output
hadoop fs -cat wordcount/output/part-r-00000

At this point we have a working system and can run more MapReduce jobs, look at results, and play around with Big Data foundations.

In summary, this is a relatively complex example. We have moved beyond a simple install of an Apache web server or Tomcat server and editing some files to get results. We have the foundations for a Big Data analytics solution running on the Oracle Compute Cloud Service. The steps to install are very similar to the other installation tutorials that we referenced earlier on Amazon and Virtual Machines. Oracle Compute is a good foundation for public domain code. Per core the processes are cheaper than other cloud vendors. Networking is non-blocking and higher performance. Storage throughput is faster and optimized for compute high I/O and tied to the compute engine. Hopefully this tutorial has given you the foundation to start playing with Hadoop on Oracle IaaS.

Hadoop on IaaS

Thu, 2016-09-22 02:07
We are going to try a weekly series of posts that talks about public domain code running on the Oracle Cloud IaaS platform. A good topic seems to be Big Data and Hadoop. Earlier we talked about running Tomcat on IaaS as well as WordPress on IaaS using bitnami.com. To start this process we are going to review what is Big Data and what is Hadoop. We are going to start with the first place that most people start with and that is looking at what books are available on the subject and walking through one or two of them. Today we are going to start with Hadoop for Dummies by Dirk deRoos. This is not the definitive source on Hadoop but a good place to have terms and concepts defined for us.

Years ago one of the big business trends was to create a data warehouse. The idea was to take all of the coporate operational data and put it into one database and grind on it to generate reports. History has shown that aggregation of the data was a difficult task as well as the processing power required to grind through reports. The task took significant resources to architect the data, to host the data, and to write select statements to generate reports for users. As retail got more and more ingrained on the web, sources outside the company became highly relevant and influential on products and services. Big Data and Hadoop have come with tools to pull from non-structured data like Twitter, Yelp, and other public web services and correlate comments and reviews to products and services.

The three characterizations of Big Data according to Hadoop for Dummies are

  • Volume - high volumes of data ranging from dozens fo terabytes to petabytes.
  • Variety - data that is organized in multiple structures, ranging from raw text to log files.
  • Velocity - data that enters an organization has some kind of value for a limited amount of time. The higher the volume of data entering an organization per second, the bigger the velocity of change.

Hadoop is architected to view high volumes of data and data with a variety of structures but it is not necessarily suited to analyze data in motion as it enters the organization but once it is stored and at rest.

Since we touched on the subject, let's define different data structures. Structured data is characterized by a high degree of organization and is typically stored in a database or spreadsheet. There is a relational mapping to the data and programs can be written to analize and process the relationships. Semi-structured data is a bit more difficult to understand than structured data. It is typically stored in the form of text data or log files. The data is typically somewhat structured and is either comma, tab, or character delimited. Unfortunately multiple log files have different formats so the stream of formatting is different for each file and parsing and analysis is a little more challenging. Unstructured data has none of the advantages of the other two data types. Structure might be in the form of directory structure, server location, or file type. The actual architecture of the data might or might not be predictable and needs a special translator to parse the data. Analyzing this type of data typically requires a data architect or data scientist to look at the data and reformat it to make it usable.

From Dummies Guide again, Hadoop is a framework for storing data on large clusters of commodity hardware. This lends itself well to running on a cloud infrastructure that is predictable and scalable. Level 3 networking is the foundation for the cluster. An application that is running on Hadoop gets its work divided among the nodes in the cluster. Some nodes aggregate data through MapReduce or YARN and the data is stored and managed by other nodes using a distributed file system know as the Hadoop distributed file system (HDFS). Hadoop started back in 2002 with the Apache Nutch project. The purpose of this project was to create the foundation for an open source search engine. The project needed to be able to scale to billions of web pages and in 2004 Google published a paper that introduced MapReduce as a way of parsing these web pages.

MapReduce performs a sequence of operations on distributed data sets. The data consists of key-value pairs and has two phases, mapping and data reduction. During the map phase, input data is split into a large number of fragments which is assigned to a map task. Map tasks process the key-value pair that it assigned to look for and proces a set of intermediate key-value pairs. This data is sorted by key and stored into a number of fragments that matches the number of reduce tasks. If for example, we are trying to parse data for the National Football League in the US we would want to spawn 32 task nodes to that we could parse data for each team in the league. Fewer nodes would cause one node to do double duty and more than 32 nodes would cause a duplication of effort. During the reduction phase each task processes the data fragment that it was assigned to it and produces an output key-value pair. For example, if we were looking for passing yardage by team we would spawn 32 task nodes. Each node would look for yardage data for each team and categorize it as either passing or rushing yardage. We might have two quarterbacks pay for a team or have a wide receiver throw a pass. The key for this team would be the passer and the value would be the yards gained. These reduce tasks are distributed across the cluster and the results of their output is stored on the HDFS when finished. We should end up with 32 data files from 32 different task nodes updating passing yardage by team.

Hadoop is more than just distributed storage and MapReduce. It also contains components to help administer and coordinate servers (HUE, Ambari, and Zookeeper), data movement management (flume and sqoop), resource management (YARN), processing framework (MapReduce, Tez, Hoya), Workflow engines (Oozie), Data Serialization (Avro), Data Collection (MapReduce, Pig, Hive, and HBase), and Data Analysis (Mahout). We will look into these system individually later.

There are commercial and public domain offerings for Hadoop.

A good project to start a small Hadoop project is log analysis. If you have a web server, it generates logs every time that a web page is requested. When a change is made to the web site, logs are generated when people log into manage the pages or change the page content. If you web page is a transactional system, orders are being placed for goods and services as well as credit card transaction processing. All of these generate log files. If we wanted to look at a product catalog and correlate what people look at in relationship to what is ordered, we could do what Amazon has done for years. We could come up with recommendations on what other people are looking at as well as what other people ordered along with this item. If, for example, we are buying a pair of athletic shoes. A common purchase with a pair of shoes is also socks. We could give a recommendation on socks that could go with the shoes or a shoe deoderant product that yields a higher profit margin. These items could be displayed with the product in the catalog or shopping cart to facilitate more goods sold on the web. We can also look at the products that no one is looking at and reduce our inventories since they are not even getting looked at casually.

We can also use Hadoop as a fraud detection or risk modeling engine. Both provide significant value to companies and allow executives to look at revenue losses as well as potential transactions that could cause a loss. For example, we might want to look at the packing material that we use for a fragile item that we sell. If we have a high rate of return on a specific item we might want to change the packing, change the shipper, or stop shipping to a part of the country that tends to have a high return rate. Any and all of these solutions can be implemented but a typical data warehouse will not be able to coordinate the data and answer these questions. Some of the data might be stored in plain text files or log files on our return web site. Parsing and processing this data is a good job for Hadoop.

In the upcoming weeks we will dive into installation of a Hadoop framework on the Oracle Cloud. We will look at resources required, pick a project, and deploy sample code into a IaaS solution. We will also look at other books and resources to help us understand and deploy sandboxes to build a prototype that might help us solve a business problem.

Hadoop on IaaS

Thu, 2016-09-22 02:07
We are going to try a weekly series of posts that talks about public domain code running on the Oracle Cloud IaaS platform. A good topic seems to be Big Data and Hadoop. Earlier we talked about running Tomcat on IaaS as well as WordPress on IaaS using bitnami.com. To start this process we are going to review what is Big Data and what is Hadoop. We are going to start with the first place that most people start with and that is looking at what books are available on the subject and walking through one or two of them. Today we are going to start with Hadoop for Dummies by Dirk deRoos. This is not the definitive source on Hadoop but a good place to have terms and concepts defined for us.

Years ago one of the big business trends was to create a data warehouse. The idea was to take all of the coporate operational data and put it into one database and grind on it to generate reports. History has shown that aggregation of the data was a difficult task as well as the processing power required to grind through reports. The task took significant resources to architect the data, to host the data, and to write select statements to generate reports for users. As retail got more and more ingrained on the web, sources outside the company became highly relevant and influential on products and services. Big Data and Hadoop have come with tools to pull from non-structured data like Twitter, Yelp, and other public web services and correlate comments and reviews to products and services.

The three characterizations of Big Data according to Hadoop for Dummies are

  • Volume - high volumes of data ranging from dozens fo terabytes to petabytes.
  • Variety - data that is organized in multiple structures, ranging from raw text to log files.
  • Velocity - data that enters an organization has some kind of value for a limited amount of time. The higher the volume of data entering an organization per second, the bigger the velocity of change.

Hadoop is architected to view high volumes of data and data with a variety of structures but it is not necessarily suited to analyze data in motion as it enters the organization but once it is stored and at rest.

Since we touched on the subject, let's define different data structures. Structured data is characterized by a high degree of organization and is typically stored in a database or spreadsheet. There is a relational mapping to the data and programs can be written to analize and process the relationships. Semi-structured data is a bit more difficult to understand than structured data. It is typically stored in the form of text data or log files. The data is typically somewhat structured and is either comma, tab, or character delimited. Unfortunately multiple log files have different formats so the stream of formatting is different for each file and parsing and analysis is a little more challenging. Unstructured data has none of the advantages of the other two data types. Structure might be in the form of directory structure, server location, or file type. The actual architecture of the data might or might not be predictable and needs a special translator to parse the data. Analyzing this type of data typically requires a data architect or data scientist to look at the data and reformat it to make it usable.

From Dummies Guide again, Hadoop is a framework for storing data on large clusters of commodity hardware. This lends itself well to running on a cloud infrastructure that is predictable and scalable. Level 3 networking is the foundation for the cluster. An application that is running on Hadoop gets its work divided among the nodes in the cluster. Some nodes aggregate data through MapReduce or YARN and the data is stored and managed by other nodes using a distributed file system know as the Hadoop distributed file system (HDFS). Hadoop started back in 2002 with the Apache Nutch project. The purpose of this project was to create the foundation for an open source search engine. The project needed to be able to scale to billions of web pages and in 2004 Google published a paper that introduced MapReduce as a way of parsing these web pages.

MapReduce performs a sequence of operations on distributed data sets. The data consists of key-value pairs and has two phases, mapping and data reduction. During the map phase, input data is split into a large number of fragments which is assigned to a map task. Map tasks process the key-value pair that it assigned to look for and proces a set of intermediate key-value pairs. This data is sorted by key and stored into a number of fragments that matches the number of reduce tasks. If for example, we are trying to parse data for the National Football League in the US we would want to spawn 32 task nodes to that we could parse data for each team in the league. Fewer nodes would cause one node to do double duty and more than 32 nodes would cause a duplication of effort. During the reduction phase each task processes the data fragment that it was assigned to it and produces an output key-value pair. For example, if we were looking for passing yardage by team we would spawn 32 task nodes. Each node would look for yardage data for each team and categorize it as either passing or rushing yardage. We might have two quarterbacks pay for a team or have a wide receiver throw a pass. The key for this team would be the passer and the value would be the yards gained. These reduce tasks are distributed across the cluster and the results of their output is stored on the HDFS when finished. We should end up with 32 data files from 32 different task nodes updating passing yardage by team.

Hadoop is more than just distributed storage and MapReduce. It also contains components to help administer and coordinate servers (HUE, Ambari, and Zookeeper), data movement management (flume and sqoop), resource management (YARN), processing framework (MapReduce, Tez, Hoya), Workflow engines (Oozie), Data Serialization (Avro), Data Collection (MapReduce, Pig, Hive, and HBase), and Data Analysis (Mahout). We will look into these system individually later.

There are commercial and public domain offerings for Hadoop.

A good project to start a small Hadoop project is log analysis. If you have a web server, it generates logs every time that a web page is requested. When a change is made to the web site, logs are generated when people log into manage the pages or change the page content. If you web page is a transactional system, orders are being placed for goods and services as well as credit card transaction processing. All of these generate log files. If we wanted to look at a product catalog and correlate what people look at in relationship to what is ordered, we could do what Amazon has done for years. We could come up with recommendations on what other people are looking at as well as what other people ordered along with this item. If, for example, we are buying a pair of athletic shoes. A common purchase with a pair of shoes is also socks. We could give a recommendation on socks that could go with the shoes or a shoe deoderant product that yields a higher profit margin. These items could be displayed with the product in the catalog or shopping cart to facilitate more goods sold on the web. We can also look at the products that no one is looking at and reduce our inventories since they are not even getting looked at casually.

We can also use Hadoop as a fraud detection or risk modeling engine. Both provide significant value to companies and allow executives to look at revenue losses as well as potential transactions that could cause a loss. For example, we might want to look at the packing material that we use for a fragile item that we sell. If we have a high rate of return on a specific item we might want to change the packing, change the shipper, or stop shipping to a part of the country that tends to have a high return rate. Any and all of these solutions can be implemented but a typical data warehouse will not be able to coordinate the data and answer these questions. Some of the data might be stored in plain text files or log files on our return web site. Parsing and processing this data is a good job for Hadoop.

In the upcoming weeks we will dive into installation of a Hadoop framework on the Oracle Cloud. We will look at resources required, pick a project, and deploy sample code into a IaaS solution. We will also look at other books and resources to help us understand and deploy sandboxes to build a prototype that might help us solve a business problem.

Corente DataCenter Setup

Tue, 2016-09-20 10:55
Yesterday we went through the theory of setting up a VPN to connect a subnet in our data center to a subnet in the Oracle Cloud. Today we are going to go through the setup of the Corente Gateway in your data center. We will be following the Corente Service Gateway Setup. Important, this lab has problems. Corente does not work with VirtualBox.

The first step that we need to do is ensure that we have a Linux server that we can install the services on in our data center. We will be installing these services on an Oracle Linux 6.7 release running in VirtualBox. To get started we install a new version from an iso image. We could just as easily have cloned an existing instance. For the installation we select the software development desktop and add some adminstration tools to help look at stuff later down the road.

According to the instructions we need to make sure that our user has sudo rights and can reconfigure network settings as well as access the internet to download code. This is done by editing the /etc/sudoers file and adding our oracle user to the access rights. We then runmodprobe -v kvm-intelegrep '^flags.*(vmx|svm)' /proc/cpuinfoto verify that we have the right type of virtualization needed to run the VPN software. It turns out that VirtualBox does not support nested virtualization which is needed by the Corente software. We are not able to run the Corente Gateway from a VirtualBox instance.

We need to follow a different set of instructions and download the binaries for the Corente Gateway Services - Virtual Environment. Unfortunately, this version was depreciated in version 9.4. We are at a roadblock and need to look at alternatives for connecting Corente Gateway Services from out sandbox to the Oracle Cloud.

I debated continuing on or showing different failed paths in this post. I decided that showing a failed attempt had as much value as showing a successful attempt. Our first attempt was to install the gateway software on a virtual instance using VirtualBox since it is a free product. Unfortunately, we can't do this since it does not support passing the virtual interfaces from the Intel Xeon chip into the guest operating system. The second attempt was to go with a binary specifically designed to work with VirtualBox and load it. It turns out that this version was decommitted and there really is not solution that works with VirtualBox. Tomorrow we will look for alternatives of running the gateway on a native Windows host and a MacOS host since I use both to write this blog. Installing a gateway on a physical host is not optimum because we might need to reconfigure ethernet connections. My preference is to stay in a sandbox but setting up an OracleVM server, VMWare server, or HyperV server would all be difficult at best. An alternative that we might look at is setting up our gateway server in another cloud instance and connecting one cloud vendor to another cloud vendor. It all depends on who exposes the hardware virtualization to their guest instances. More on that tomorrow.

Corente DataCenter Setup

Tue, 2016-09-20 10:55
Yesterday we went through the theory of setting up a VPN to connect a subnet in our data center to a subnet in the Oracle Cloud. Today we are going to go through the setup of the Corente Gateway in your data center. We will be following the Corente Service Gateway Setup. Important, this lab has problems. Corente does not work with VirtualBox.

The first step that we need to do is ensure that we have a Linux server that we can install the services on in our data center. We will be installing these services on an Oracle Linux 6.7 release running in VirtualBox. To get started we install a new version from an iso image. We could just as easily have cloned an existing instance. For the installation we select the software development desktop and add some adminstration tools to help look at stuff later down the road.

According to the instructions we need to make sure that our user has sudo rights and can reconfigure network settings as well as access the internet to download code. This is done by editing the /etc/sudoers file and adding our oracle user to the access rights. We then run

modprobe -v kvm-intel
egrep '^flags.*(vmx|svm)' /proc/cpuinfo
to verify that we have the right type of virtualization needed to run the VPN software. It turns out that VirtualBox does not support nested virtualization which is needed by the Corente software. We are not able to run the Corente Gateway from a VirtualBox instance.

We need to follow a different set of instructions and download the binaries for the Corente Gateway Services - Virtual Environment. Unfortunately, this version was depreciated in version 9.4. We are at a roadblock and need to look at alternatives for connecting Corente Gateway Services from out sandbox to the Oracle Cloud.

I debated continuing on or showing different failed paths in this post. I decided that showing a failed attempt had as much value as showing a successful attempt. Our first attempt was to install the gateway software on a virtual instance using VirtualBox since it is a free product. Unfortunately, we can't do this since it does not support passing the virtual interfaces from the Intel Xeon chip into the guest operating system. The second attempt was to go with a binary specifically designed to work with VirtualBox and load it. It turns out that this version was decommitted and there really is not solution that works with VirtualBox. Tomorrow we will look for alternatives of running the gateway on a native Windows host and a MacOS host since I use both to write this blog. Installing a gateway on a physical host is not optimum because we might need to reconfigure ethernet connections. My preference is to stay in a sandbox but setting up an OracleVM server, VMWare server, or HyperV server would all be difficult at best. An alternative that we might look at is setting up our gateway server in another cloud instance and connecting one cloud vendor to another cloud vendor. It all depends on who exposes the hardware virtualization to their guest instances. More on that tomorrow.

connecting subnets

Mon, 2016-09-19 02:07
This week we are going to focus on connecting computers. It seems like we have been doing this for a while. We have looked at connecting our desktop to a cloud server (Linux and Windows). We have also looked at hiding a server in the cloud so that you can only get to it from a proxy host and not from our desktop or anywhere on the cloud. In this blog we are going to start talking about changing the configuration so that we create a new network interface on our desktop and use this new network interface to connect through a secure tunnel to a second network interface on a compute cloud. A good diagram of what we are trying to accomplish can be seen below.

Note that there are three components to this. The first is Corente Gateway running in your data center. The second is Corente Gateway running in the Oracle Cloud. The third is the Corente App Net Manager Service Portal. The basic idea behind this is that we are going to create a second network as we did with hiding a server in the cloud. We initially setup one instance so that we could access it through a public IP address 140.86.14.242. It also has a private IP address for inside the Oracle Cloud of 10.196.135.XXX. I didn't record the internal IP address because I rarely use it for anything. The key here is that by default a subnet range of 10.196.135/24 was configured. We were able to connect to our second server at 10.196.135.74 because we allowed ssh access on the private subnet but not on the public network. When this blog was initially written Oracle did not support defining your own subnet and assigned you to a private network based on the rack that you got provisioned into inside the Cloud data center. As of OpenWorld, subnet support was announced so that you could define your own network range. One of the key feedbacks that Oracle got was that customers did not like creating a new subnet in their data center to match the Oracle subnet. They would rather define their own subnet in the Oracle Cloud to match the subnets that they have in their own data center.

Let's take each of these components one at a time. First the Corente Gateway running in your data center. This is a virtual image that you download or software components that you install on a Linux system and run in your data center. The concept here is that you have a (virtual) computer that runs in your data center. The system has two network interfaces attached. The first can connect to the public internet through network address translation or is directly connected to the internet or through a router. The second network interface connects to your private subnet. This IP address is typically not routable like a 10. or 192.168. network. There is not mistake that your are hoping to get to a machine on the internet because these networks are non-routable. The key is that the Corente Gateway actually has a listener that looks for communications intended for this non-routable network and replicates the packets through a secure tunnel to the Corente Gateway running in the Oracle Cloud. All of the traffic passes from your local network which is non-routable to another network hundreds or thousands of miles away and gives you the ability to talk to more computers on that network. This effectively gives you a private virtual network from your data center to a cloud data center.

Rather than using a software virtual gateway you can use a hardware router to establish this same connection. We are not going to talk about this as we go through our setup exercises but realize that it can be done. This is typically what a corporation does to extend resources to another data center, another office, or a cloud vendor for seasonal peak periods or cheaper resources. The benefit to this configuration is that it can be done by corporate IT and not by an individual.

The key things that get setup during this virtual private network connection are name parsing (DNS), ip routing (gateways and routers), and broadcast/multicast of messages. Most VPN configurations support layer 3 and above. If you do a arp request the arp is not passed through the VPN and never reaches the other data center. With Corente it uses the GRE tunneling protocol which is a layer 2 option. Supporting layer 2 allows you to route ping requests, multicast requests, and additional tunnel requests at a much lower and faster level. As we discussed in an earlier blog, Microsoft does not allow layer 2 to go into or out of their Azure cloud. Amazon allows layer 2 inside their cloud but not into and out of their cloud. This is a key differentiator between the AWS, Azure, and Oracle clouds.

The second component of the virtual private network is the Oracle Cloud Corente Gateway. This is the target side where the gateway in your data center is the initiator. Both gateways allow traffic to go between the two networks on the designated subnet. Both gateways allow for communication between servers in the data center and servers in the Oracle cloud. When you combine the VPN gateways with the Security Lists and Security Rules you get a secure network that allows you to share a rack of server and not worry about someone else who is using the Oracle Cloud from accessing your data center even if they are assigned an IP address on the same subnet. When you define a Security List or Security Rule, these exceptions and holes allow for traffic from computers in your account to access the VPN. No one else in the same rack or same cloud data center can access your VPN or your data center.

The third component is the app net management service portal. This portal establishes connections and rules for the other two components. When you install each of the components it communicates with the admin portal to get configuration information. If you need to change a configuration or keys or some aspect of the communication protocol this is done in the admin portal and it communicates to the other two components to update the configuration. This service also allows you to monitor traffic and record traffic between the Oracle Cloud and your data center.

The network resources for your data center installed service will look like

with br0 being your public facing connection and br1 connecting to your subnet in your data center. A similar configuration is done in the Oracle Cloud but this is pre-configured and can be provisioned as a public image. The only thing that you need to configure is the subnet address range and relationship with the app net management service portal.

Today was a lot of theory and high level discussions. Tomorrow we will dive into configuration of the gateway in your data center. The day after that we will look at provisioning the gateway in the Oracle Cloud and connecting the two. Just a quick reminder, we talked about how to establish a connection between your desktop and a cloud server. By going the a VPN configuration we will get around having to hide a server in the cloud. We can setup all of our servers to have private network links and only open up web servers or secure web servers to talk to the public internet. We can use ssh and rdp from our desktops at home or in our offices to communicate to the cloud servers. Setting up the VPN is typically a corporate responsibility and giving you access to the resources. What you need to know are what cloud resources you have access to and how much money you have in your budget to solve your business problem.

connecting subnets

Mon, 2016-09-19 02:07
This week we are going to focus on connecting computers. It seems like we have been doing this for a while. We have looked at connecting our desktop to a cloud server (Linux and Windows). We have also looked at hiding a server in the cloud so that you can only get to it from a proxy host and not from our desktop or anywhere on the cloud. In this blog we are going to start talking about changing the configuration so that we create a new network interface on our desktop and use this new network interface to connect through a secure tunnel to a second network interface on a compute cloud. A good diagram of what we are trying to accomplish can be seen below.

Note that there are three components to this. The first is Corente Gateway running in your data center. The second is Corente Gateway running in the Oracle Cloud. The third is the Corente App Net Manager Service Portal. The basic idea behind this is that we are going to create a second network as we did with hiding a server in the cloud. We initially setup one instance so that we could access it through a public IP address 140.86.14.242. It also has a private IP address for inside the Oracle Cloud of 10.196.135.XXX. I didn't record the internal IP address because I rarely use it for anything. The key here is that by default a subnet range of 10.196.135/24 was configured. We were able to connect to our second server at 10.196.135.74 because we allowed ssh access on the private subnet but not on the public network. When this blog was initially written Oracle did not support defining your own subnet and assigned you to a private network based on the rack that you got provisioned into inside the Cloud data center. As of OpenWorld, subnet support was announced so that you could define your own network range. One of the key feedbacks that Oracle got was that customers did not like creating a new subnet in their data center to match the Oracle subnet. They would rather define their own subnet in the Oracle Cloud to match the subnets that they have in their own data center.

Let's take each of these components one at a time. First the Corente Gateway running in your data center. This is a virtual image that you download or software components that you install on a Linux system and run in your data center. The concept here is that you have a (virtual) computer that runs in your data center. The system has two network interfaces attached. The first can connect to the public internet through network address translation or is directly connected to the internet or through a router. The second network interface connects to your private subnet. This IP address is typically not routable like a 10. or 192.168. network. There is not mistake that your are hoping to get to a machine on the internet because these networks are non-routable. The key is that the Corente Gateway actually has a listener that looks for communications intended for this non-routable network and replicates the packets through a secure tunnel to the Corente Gateway running in the Oracle Cloud. All of the traffic passes from your local network which is non-routable to another network hundreds or thousands of miles away and gives you the ability to talk to more computers on that network. This effectively gives you a private virtual network from your data center to a cloud data center.

Rather than using a software virtual gateway you can use a hardware router to establish this same connection. We are not going to talk about this as we go through our setup exercises but realize that it can be done. This is typically what a corporation does to extend resources to another data center, another office, or a cloud vendor for seasonal peak periods or cheaper resources. The benefit to this configuration is that it can be done by corporate IT and not by an individual.

The key things that get setup during this virtual private network connection are name parsing (DNS), ip routing (gateways and routers), and broadcast/multicast of messages. Most VPN configurations support layer 3 and above. If you do a arp request the arp is not passed through the VPN and never reaches the other data center. With Corente it uses the GRE tunneling protocol which is a layer 2 option. Supporting layer 2 allows you to route ping requests, multicast requests, and additional tunnel requests at a much lower and faster level. As we discussed in an earlier blog, Microsoft does not allow layer 2 to go into or out of their Azure cloud. Amazon allows layer 2 inside their cloud but not into and out of their cloud. This is a key differentiator between the AWS, Azure, and Oracle clouds.

The second component of the virtual private network is the Oracle Cloud Corente Gateway. This is the target side where the gateway in your data center is the initiator. Both gateways allow traffic to go between the two networks on the designated subnet. Both gateways allow for communication between servers in the data center and servers in the Oracle cloud. When you combine the VPN gateways with the Security Lists and Security Rules you get a secure network that allows you to share a rack of server and not worry about someone else who is using the Oracle Cloud from accessing your data center even if they are assigned an IP address on the same subnet. When you define a Security List or Security Rule, these exceptions and holes allow for traffic from computers in your account to access the VPN. No one else in the same rack or same cloud data center can access your VPN or your data center.

The third component is the app net management service portal. This portal establishes connections and rules for the other two components. When you install each of the components it communicates with the admin portal to get configuration information. If you need to change a configuration or keys or some aspect of the communication protocol this is done in the admin portal and it communicates to the other two components to update the configuration. This service also allows you to monitor traffic and record traffic between the Oracle Cloud and your data center.

The network resources for your data center installed service will look like

with br0 being your public facing connection and br1 connecting to your subnet in your data center. A similar configuration is done in the Oracle Cloud but this is pre-configured and can be provisioned as a public image. The only thing that you need to configure is the subnet address range and relationship with the app net management service portal.

Today was a lot of theory and high level discussions. Tomorrow we will dive into configuration of the gateway in your data center. The day after that we will look at provisioning the gateway in the Oracle Cloud and connecting the two. Just a quick reminder, we talked about how to establish a connection between your desktop and a cloud server. By going the a VPN configuration we will get around having to hide a server in the cloud. We can setup all of our servers to have private network links and only open up web servers or secure web servers to talk to the public internet. We can use ssh and rdp from our desktops at home or in our offices to communicate to the cloud servers. Setting up the VPN is typically a corporate responsibility and giving you access to the resources. What you need to know are what cloud resources you have access to and how much money you have in your budget to solve your business problem.

subnets

Tue, 2016-08-30 09:24
Let's take a step back and look at networking from a different perspective. A good reference book to start with from a corporate IT perspective is CCENT Cisco Certified Entry Networking Technician ICND1 Study Guide (Exam 100-101) with Boson NetSim Limited Edition, 2nd Edition. You don't need to read this book to get Cisco Certified but it does define terms and concepts well.

At the lowest level you start with a LAN or local area network. From the study guide, "Usually, LANs are confined to a single room, floor, or building, although they can cover as much as an entire campus. LANs are generally created to fulfill basic networking needs, such as file and printer sharing, file transfers, e-mail, gaming, and connectivity to the Internet or outside world." This typically is connected with a single network hub or a series of hubs and a router or gateway connects us to a larger network or the internet. The key services that you need on a LAN are a naming service and gateway service. The naming service allows you to find services by name rather than ip address. The gateway service allows you to connect to this service that you want to connect to that are not on your local network. It is basically as simple as that. A gateway typically also acts as a firewall and or network address translation device (NAT). The firewall either allows or blocks connections to a specific port on a specific ip address. It might have a rule that says drop all traffic or allow traffic from anywhere, from a network range, or from a specific network address. Network address translation allows you to communicate to the outside world from your desktop on a private nonroutable ip address and have the service that you are connecting to know how to get back to you. For example. my home network has an internet router that connects to AT&T. When the router connects to AT&T, it gets a public ip address from the internet provider. This address is typically something like 90.122.5.12. This is a routable address that can be reached anywhere from the internet. The router assigns an ip address to my desktop and uses the address range 192.168.1.0 to 192.168.1.100 to assign the addresses. This implies that I can have 101 devices in my house. When I connect to gmail.com to read my email I do a name search for gmail.com and get back the ip address. My desktop, assigned to 192.168.1.100 does an http get from gmail.com on port 80. This http request is funneled through my internet router which changes the ip header assigning the transmitter ip address to 90.122.5.12. It keeps track of the assignment so that a response coming back from gmail.com gets routed back to my desktop rather than my kids desktop on the same network. To gmail.com it thinks that you are connecting from AT&T and not your desktop.

It is important to take our discussion back to layer 2 and layer 3 when talking about routing. If we are operating on a LAN, we can use layer 2 multicast to broadcast packets to all computers on our local network. Most broadband routers support all of layer 3 and part of layer 2. You can't really take a video camera in your home and multicast it to your neighbors so that they can see your video feed but you can do this in your home. You can ping their broadband router if you know the ip address. Typically the ip address of a router is not mapped to a domain name so you can't really ask for the ip address of the router two houses down. If you know their ip address you can setup links between the two houses and through tcp/ip or udp/ip share video between the houses.

If we want to limit the number of computers that we can put on our home or office network we use subnet netmasks to limit the ip address range and program the router to look for all ip addresses in the netmask range. The study guide book does a good job of describing subnetting. The diagram below shows how to use a netmask to define a network that can host just over a hundred computers.

Note that we have defined a network with a network id of 192.168.1.64 by using netmask 255.255.255.192 which limits the number of computers to 127 computers. If we put a computer with ip address of 192.168.1.200 on this network we won't be able to connect to the internet and we won't be able to use layer 2 protocols to communicate to all of the computers on this network. With this configuration we have effectively created a subnet inside our network. If we combine this with the broadcast address that is used when we create our network connection we can divide our network into ranges. The study guide book goes through an exercise of setting up a nework for different floors in an office and limiting each floor to a fixed number of computers and devices.

One of the design challenges faced by people who write applications is where do you layer security and layer connectivity. Do you configure an operating system firewall to restrict address ranges that it will accept requests from? Do you push this out to the network and assume that the router will limit traffic on the network? Do you push this out to the corporate or network firewall and assume that everything is stopped at the castle wall. The real answer is yes. You should setup security at all of these layers. When you make an assumption things fall apart when someone opens an email and lets the trojan horse through the castle gates.

If you look at the three major cloud vendors they all take the same basic approach. Microsoft and Oracle don't let you configure the subnet that you are assigned to. You get assigned to a subnet and have little choice on the ip address range for the computers that you are placed upon in the cloud solution. Amazon allows you to define a subnet and ip address range. This is good and bad. It makes routing a little more difficult in the cloud and address translation needs to be programmed for the subnet that you pick. Going with vendors that assign an ip address range have hardwired routing for that network. This optimizes routing and simplifies the routing tables. Amazon faces problems with EC2 and S3 connectivity and ends up charging for data transmitted from S3 to EC2. Bandwidth is limited with these connections partly due to routing configuration limitations. Oracle and Microsoft have simpler routing maps and can put switched networks between compute and storage which provides a faster and higher throughput storage network connection.

The fun part comes when we want to connect our network which is on a non-routable network to our neighbors. We might want to share our camera systems and record them into a central video archive. Corporations face this when they want to create a cloud presence yet keep servers in their data center. Last week we talked about hiding a server in the cloud and putting our database where you can't access it form the public internet. This is great for security but what happens when we need to connect with sql developer to the database to upload a new stored procedure? We need to be able to connect to this private subnet and map it to our corporate network. We would like to be able to get to 10.10.1.122 from our network which is mapped to 192.168.1.0. How do we do this? There are two approaches. First, we can define a secondary network in our data center to match the 10.10.1.0 network and create a secure tunnel between the two network. The second is to remap the cloud network to the 192.168.1.0 subnet and create a secure tunnel between the two networks. Do you see a common theme here? You need a secure tunnel with both solutions and you need to change the subnet either at the cloud host or in your data center. Some shops have the flexibility to change subnets in their corporate network or data center to match the cloud subnet (as is required with Oracle and Microsoft) while others require the cloud vendor to change the subnet configuration to match their corporate policy (Amazon provides this).

Today we are not doing to dive deep into virtual private networks, IPSec, or secure tunnels. We are going to touch on the subjects and discuss them in depth later. The basic concept is a database developer working on their desktop needs to connect to a database server in the cloud. A Java developer working on their desktop needs to connect to a Java server in the cloud. We also need to hide the database server so that no one from the public internet can connect to the database server. We want to limit the connection to the Java server to be port 443 for secure https to public ip addresses and allow ssh login on port 22 from our corporate network. If we set a subnet mask, define a virtual private secure network between our corporate network and cloud network, and allow local desktops to join this secure network we can solve the problem. Defining the private subnet in the cloud and connecting it to our corporate network is not enough. This is going back to the castle wall analogy. We want to define firewall rules at the OS layer. We want to define routing protocols between the two networks and allow or block communication at different layers and ports. We want to create a secure connection from our sql developer, java developer, or eclipse development tools to our production servers. We also want to facilitate tools like Enterprise Manager to measure and control configurations as well as notify us of overload or failure conditions.

In summary, there are a variety of decisions that need to be made when deploying a cloud solution. Letting the application developer deploy the configuration is typically a bad idea because they don't think of all of the corporate requirements. Letting the IT Security specialist deploy the configuration is also a bad idea. The solution will be so limiting that it makes the cloud services unusable. The architecture needs to be a mix of accessibility, security, as well as usability. Network configurations are not always the easiest discussion to have but critical to have early in the conversation. This blog is not trying to say that one cloud vendor is better than the other but trying to simply point out the differences so that you as a consumer can decide what works best for your problem.

subnets

Tue, 2016-08-30 09:24
Let's take a step back and look at networking from a different perspective. A good reference book to start with from a corporate IT perspective is CCENT Cisco Certified Entry Networking Technician ICND1 Study Guide (Exam 100-101) with Boson NetSim Limited Edition, 2nd Edition. You don't need to read this book to get Cisco Certified but it does define terms and concepts well.

At the lowest level you start with a LAN or local area network. From the study guide, "Usually, LANs are confined to a single room, floor, or building, although they can cover as much as an entire campus. LANs are generally created to fulfill basic networking needs, such as file and printer sharing, file transfers, e-mail, gaming, and connectivity to the Internet or outside world." This typically is connected with a single network hub or a series of hubs and a router or gateway connects us to a larger network or the internet. The key services that you need on a LAN are a naming service and gateway service. The naming service allows you to find services by name rather than ip address. The gateway service allows you to connect to this service that you want to connect to that are not on your local network. It is basically as simple as that. A gateway typically also acts as a firewall and or network address translation device (NAT). The firewall either allows or blocks connections to a specific port on a specific ip address. It might have a rule that says drop all traffic or allow traffic from anywhere, from a network range, or from a specific network address. Network address translation allows you to communicate to the outside world from your desktop on a private nonroutable ip address and have the service that you are connecting to know how to get back to you. For example. my home network has an internet router that connects to AT&T. When the router connects to AT&T, it gets a public ip address from the internet provider. This address is typically something like 90.122.5.12. This is a routable address that can be reached anywhere from the internet. The router assigns an ip address to my desktop and uses the address range 192.168.1.0 to 192.168.1.100 to assign the addresses. This implies that I can have 101 devices in my house. When I connect to gmail.com to read my email I do a name search for gmail.com and get back the ip address. My desktop, assigned to 192.168.1.100 does an http get from gmail.com on port 80. This http request is funneled through my internet router which changes the ip header assigning the transmitter ip address to 90.122.5.12. It keeps track of the assignment so that a response coming back from gmail.com gets routed back to my desktop rather than my kids desktop on the same network. To gmail.com it thinks that you are connecting from AT&T and not your desktop.

It is important to take our discussion back to layer 2 and layer 3 when talking about routing. If we are operating on a LAN, we can use layer 2 multicast to broadcast packets to all computers on our local network. Most broadband routers support all of layer 3 and part of layer 2. You can't really take a video camera in your home and multicast it to your neighbors so that they can see your video feed but you can do this in your home. You can ping their broadband router if you know the ip address. Typically the ip address of a router is not mapped to a domain name so you can't really ask for the ip address of the router two houses down. If you know their ip address you can setup links between the two houses and through tcp/ip or udp/ip share video between the houses.

If we want to limit the number of computers that we can put on our home or office network we use subnet netmasks to limit the ip address range and program the router to look for all ip addresses in the netmask range. The study guide book does a good job of describing subnetting. The diagram below shows how to use a netmask to define a network that can host just over a hundred computers.

Note that we have defined a network with a network id of 192.168.1.64 by using netmask 255.255.255.192 which limits the number of computers to 127 computers. If we put a computer with ip address of 192.168.1.200 on this network we won't be able to connect to the internet and we won't be able to use layer 2 protocols to communicate to all of the computers on this network. With this configuration we have effectively created a subnet inside our network. If we combine this with the broadcast address that is used when we create our network connection we can divide our network into ranges. The study guide book goes through an exercise of setting up a nework for different floors in an office and limiting each floor to a fixed number of computers and devices.

One of the design challenges faced by people who write applications is where do you layer security and layer connectivity. Do you configure an operating system firewall to restrict address ranges that it will accept requests from? Do you push this out to the network and assume that the router will limit traffic on the network? Do you push this out to the corporate or network firewall and assume that everything is stopped at the castle wall. The real answer is yes. You should setup security at all of these layers. When you make an assumption things fall apart when someone opens an email and lets the trojan horse through the castle gates.

If you look at the three major cloud vendors they all take the same basic approach. Microsoft and Oracle don't let you configure the subnet that you are assigned to. You get assigned to a subnet and have little choice on the ip address range for the computers that you are placed upon in the cloud solution. Amazon allows you to define a subnet and ip address range. This is good and bad. It makes routing a little more difficult in the cloud and address translation needs to be programmed for the subnet that you pick. Going with vendors that assign an ip address range have hardwired routing for that network. This optimizes routing and simplifies the routing tables. Amazon faces problems with EC2 and S3 connectivity and ends up charging for data transmitted from S3 to EC2. Bandwidth is limited with these connections partly due to routing configuration limitations. Oracle and Microsoft have simpler routing maps and can put switched networks between compute and storage which provides a faster and higher throughput storage network connection.

The fun part comes when we want to connect our network which is on a non-routable network to our neighbors. We might want to share our camera systems and record them into a central video archive. Corporations face this when they want to create a cloud presence yet keep servers in their data center. Last week we talked about hiding a server in the cloud and putting our database where you can't access it form the public internet. This is great for security but what happens when we need to connect with sql developer to the database to upload a new stored procedure? We need to be able to connect to this private subnet and map it to our corporate network. We would like to be able to get to 10.10.1.122 from our network which is mapped to 192.168.1.0. How do we do this? There are two approaches. First, we can define a secondary network in our data center to match the 10.10.1.0 network and create a secure tunnel between the two network. The second is to remap the cloud network to the 192.168.1.0 subnet and create a secure tunnel between the two networks. Do you see a common theme here? You need a secure tunnel with both solutions and you need to change the subnet either at the cloud host or in your data center. Some shops have the flexibility to change subnets in their corporate network or data center to match the cloud subnet (as is required with Oracle and Microsoft) while others require the cloud vendor to change the subnet configuration to match their corporate policy (Amazon provides this).

Today we are not doing to dive deep into virtual private networks, IPSec, or secure tunnels. We are going to touch on the subjects and discuss them in depth later. The basic concept is a database developer working on their desktop needs to connect to a database server in the cloud. A Java developer working on their desktop needs to connect to a Java server in the cloud. We also need to hide the database server so that no one from the public internet can connect to the database server. We want to limit the connection to the Java server to be port 443 for secure https to public ip addresses and allow ssh login on port 22 from our corporate network. If we set a subnet mask, define a virtual private secure network between our corporate network and cloud network, and allow local desktops to join this secure network we can solve the problem. Defining the private subnet in the cloud and connecting it to our corporate network is not enough. This is going back to the castle wall analogy. We want to define firewall rules at the OS layer. We want to define routing protocols between the two networks and allow or block communication at different layers and ports. We want to create a secure connection from our sql developer, java developer, or eclipse development tools to our production servers. We also want to facilitate tools like Enterprise Manager to measure and control configurations as well as notify us of overload or failure conditions.

In summary, there are a variety of decisions that need to be made when deploying a cloud solution. Letting the application developer deploy the configuration is typically a bad idea because they don't think of all of the corporate requirements. Letting the IT Security specialist deploy the configuration is also a bad idea. The solution will be so limiting that it makes the cloud services unusable. The architecture needs to be a mix of accessibility, security, as well as usability. Network configurations are not always the easiest discussion to have but critical to have early in the conversation. This blog is not trying to say that one cloud vendor is better than the other but trying to simply point out the differences so that you as a consumer can decide what works best for your problem.

networking differences between cloud providers

Mon, 2016-08-29 09:00
In this blog entry we are going to perform a simple task of enabling an Apache Web Server on a Linux server and look at how to do this on the Oracle Cloud, Amazon AWS, and Microsoft Azure. Last week we did this for the Oracle Cloud but we will quickly review this again. As we go down this path we will look at the different options presented to you as you create a new instance and see how the three cloud vendors diverge in their approach to services. Which version of Linux we select is not critical. We are looking at the cloud tooling and what is required to deploy and secure an instance. Our goals are
  • Deploy a Linux instance into a cloud service
  • Enable port 22 to allow us to communicate from our desktop into the Linux instance
  • Enable port 80 to allow us to communicate from the public internet into the Linux instance
  • Disable all other services coming into this instance.
  • We will use DHCP initially to get an ip address assigned to us but look at static ip addresses in the end

Step 1:Deploy a Linux instance into a small compute service. Go with the smallest compute shape to save money, go with the smallest memory allocation because we don't need much for a test web server, go with the default network interfaces and have an ip address assigned, go with the smallest disk you can to speed up the process.

Step 1a - Oracle Public Cloud

We go to the Compute Console and click on Create Instance. This takes us through screens that allow us to select an operating system, core count and memory size. When we get to the instance config we have the option of defining network security rules with a Security List. We can either create a new security list or select an existing security list. We will in the end select the default that allows us to connect to port 22 and modify the security list at a later point. We could have selected the WebServer entry from the Security List because we have done this before. For this exercise we will select the default and come back later and add another access point. Once we get to the review screen we can create the instance. The only networking questions that we were asked was what Security List definition do we want.

Step 1b - Amazon AWS

We go to the EC2 Console and click on EC2 followed by Launch Instance. From the launch screen we select a Linux operating system and start the configuration. Note that the network and subnet menus allow you to deploy your instance into an ip address range. This is different than the Oracle Cloud where you are assigned into a non-routable ip address range based on the server that you are dropped into. Since these are private ip addresses for a single server this is really not a significant issue. We are going to accept the defaults her and configure the ports in a couple of screens. We are going to go with a dhcp public ip address to be able to attach to our web server.

We accept the default storage and configure the ports that we want to open for our instance. We can define a new security group or accept an existing security group. For this example we are going to add http port 80 since it is a simple add at this point and move forward with this configuration. We could go with a predefined configuration that allows port 80 and 22 but for this example we will create a new one. We then review and launch the instance.

Step 1c - Microsoft Azure

We go to the Azure Portal and click on Virtual Machine -> Add which takes us to the Marketplace. From here we type in Linux and pick a random Linux operating system to boot from. We are assigned a subnet just like we were with the Oracle Cloud and have the ability to add a firewall rule to allow port 80 and 22 through from the public internet. Once we have this defined we can review and launch our instance.

Step 2: Log into your instance and add the apache web server. This can easily be done with a yum install apache2 command. We then edit the /var/www/index.html file so that we can see an answer from the web server.

Step 3: Verify the network security configuration of the instance to make sure that ports 80 and 22 are open.

Step 3a: Oracle Cloud

When we created the instance we went with the default network configuration which only has port 22 open. We now need to add port 80 as an open inbound port for the public internet. This is done by going to the Compute Instance console and viewing our web server instance. By looking at the instance we can see that we have the default Security List associated with our instance. If we have a rule defined for port 80 we can just click on Add Security List and add the value. We are going to assume that we have not defined a rule and need to do so. We create a new rule which allows us to allow http traffic from the public internet to our security list WebServer. We than need to go back and add a new Security List to our instance and select WebServer which allows port 80 and 22.

Step 3b and 3c: AWS and Azure

We really don't need to do anything here because both AWS and Azure gave us the ability to add a port definition in the menu creation system. Had we selected a predefine security list there would be no step 3 for any of the services.

Surprisingly, we are done. Simple network configuration is simple for all three vendors. The key differences that we see are that Amazon and Microsoft give you the ability to define individual port definitions as you create your instance. Oracle wants you to define this with Security Rules and Security Lists rather than one at a time for each instance. All three platforms allow you to configure firewall rules ahead of time and add those as configurations. In this example we were assuming a first time experience which is not the normal way of doing things. The one differential that did stand out is that Amazon allows you to pick and choose your subnet assignment. Oracle and Microsoft really don't give you choices and assign you an ip range. All three give you the option of static of dynamic public ip addresses. For our experiment there really isn't much difference in how any of the cloud vendors provision and administer firewall configurations.

networking differences between cloud providers

Mon, 2016-08-29 09:00
In this blog entry we are going to perform a simple task of enabling an Apache Web Server on a Linux server and look at how to do this on the Oracle Cloud, Amazon AWS, and Microsoft Azure. Last week we did this for the Oracle Cloud but we will quickly review this again. As we go down this path we will look at the different options presented to you as you create a new instance and see how the three cloud vendors diverge in their approach to services. Which version of Linux we select is not critical. We are looking at the cloud tooling and what is required to deploy and secure an instance. Our goals are
  • Deploy a Linux instance into a cloud service
  • Enable port 22 to allow us to communicate from our desktop into the Linux instance
  • Enable port 80 to allow us to communicate from the public internet into the Linux instance
  • Disable all other services coming into this instance.
  • We will use DHCP initially to get an ip address assigned to us but look at static ip addresses in the end

Step 1:Deploy a Linux instance into a small compute service. Go with the smallest compute shape to save money, go with the smallest memory allocation because we don't need much for a test web server, go with the default network interfaces and have an ip address assigned, go with the smallest disk you can to speed up the process.

Step 1a - Oracle Public Cloud

We go to the Compute Console and click on Create Instance. This takes us through screens that allow us to select an operating system, core count and memory size. When we get to the instance config we have the option of defining network security rules with a Security List. We can either create a new security list or select an existing security list. We will in the end select the default that allows us to connect to port 22 and modify the security list at a later point. We could have selected the WebServer entry from the Security List because we have done this before. For this exercise we will select the default and come back later and add another access point. Once we get to the review screen we can create the instance. The only networking questions that we were asked was what Security List definition do we want.

Step 1b - Amazon AWS

We go to the EC2 Console and click on EC2 followed by Launch Instance. From the launch screen we select a Linux operating system and start the configuration. Note that the network and subnet menus allow you to deploy your instance into an ip address range. This is different than the Oracle Cloud where you are assigned into a non-routable ip address range based on the server that you are dropped into. Since these are private ip addresses for a single server this is really not a significant issue. We are going to accept the defaults her and configure the ports in a couple of screens. We are going to go with a dhcp public ip address to be able to attach to our web server.

We accept the default storage and configure the ports that we want to open for our instance. We can define a new security group or accept an existing security group. For this example we are going to add http port 80 since it is a simple add at this point and move forward with this configuration. We could go with a predefined configuration that allows port 80 and 22 but for this example we will create a new one. We then review and launch the instance.

Step 1c - Microsoft Azure

We go to the Azure Portal and click on Virtual Machine -> Add which takes us to the Marketplace. From here we type in Linux and pick a random Linux operating system to boot from. We are assigned a subnet just like we were with the Oracle Cloud and have the ability to add a firewall rule to allow port 80 and 22 through from the public internet. Once we have this defined we can review and launch our instance.

Step 2: Log into your instance and add the apache web server. This can easily be done with a yum install apache2 command. We then edit the /var/www/index.html file so that we can see an answer from the web server.

Step 3: Verify the network security configuration of the instance to make sure that ports 80 and 22 are open.

Step 3a: Oracle Cloud

When we created the instance we went with the default network configuration which only has port 22 open. We now need to add port 80 as an open inbound port for the public internet. This is done by going to the Compute Instance console and viewing our web server instance. By looking at the instance we can see that we have the default Security List associated with our instance. If we have a rule defined for port 80 we can just click on Add Security List and add the value. We are going to assume that we have not defined a rule and need to do so. We create a new rule which allows us to allow http traffic from the public internet to our security list WebServer. We than need to go back and add a new Security List to our instance and select WebServer which allows port 80 and 22.

Step 3b and 3c: AWS and Azure

We really don't need to do anything here because both AWS and Azure gave us the ability to add a port definition in the menu creation system. Had we selected a predefine security list there would be no step 3 for any of the services.

Surprisingly, we are done. Simple network configuration is simple for all three vendors. The key differences that we see are that Amazon and Microsoft give you the ability to define individual port definitions as you create your instance. Oracle wants you to define this with Security Rules and Security Lists rather than one at a time for each instance. All three platforms allow you to configure firewall rules ahead of time and add those as configurations. In this example we were assuming a first time experience which is not the normal way of doing things. The one differential that did stand out is that Amazon allows you to pick and choose your subnet assignment. Oracle and Microsoft really don't give you choices and assign you an ip range. All three give you the option of static of dynamic public ip addresses. For our experiment there really isn't much difference in how any of the cloud vendors provision and administer firewall configurations.

Pages