Monday, June 12, 2017

Oracle Data Visualization Cloud Service - upload files

Within the Oracle Cloud portfolio Oracle has positioned the Oracle Data Visualization Cloud Service as the tool to explore your data, visualize it and share your information with other people within the enterprise. The Oracle Data Visualization Cloud Service can be used as a part of a data democratization strategy within an enterprise to provide all users access to data where and whenever they need it. The concept of data democratization is discussed in another blogpost on this blog.

Currently the Oracle Data Visualization Cloud Service provides two main ways of getting data in the Oracle Data Visualization Cloud Service. One is by connecting it to a oracle database source, for example located in the Oracle database cloud service, another is by uploading a file. Primarily CSV and XLSX files are supported for uploading data.

In most cases it is not a best practice to upload files as the data is relative static and is not connected to a live datasource as you would have with a database connection. However, in some cases it can be a good way to get data in. Examples are; users who add their own content and do not have the means to connect an Oracle database or relative static data.

Example data from DUO
In the example below we add a relative static piece of data to the Oracle Data Visualization Cloud Service. This is a year over year report of people dropping out of schools in the Netherlands. The data is per year, per location and per education type and is freely available as open-data from the DUO website. You can locate the file for reference here.

Loading a data file
When a user wants to load data into the Oracle Data Visualization Cloud Service the most easy way to do so from and end-user perspective is to use the GUI. Loading data includes a limited number of steps.

1) within the Data Sources section navigate to "Create" - " Data Source" Here you can select the type file by default. When selected you are presented with the option to select a file on your local file system.


2) the next step, after the file is uploaded, is to verify and if needed modify the definition of the uploaded data.


3) after this step is completed you will find the file ready for you use in your data sources as shown below.

In effect, those are the only actions needed by a user to add data to the Oracle Data Visualization Cloud Service.

Tuesday, June 06, 2017

Oracle Cloud - The value of edge computing

A term you currently see coming up is edge computing, edge computing is the model where you push computations and intelligence to the edge of the network or the edge of the cloud. Or, in the words of techtarget.com ; “ Edge computing is a distributed information technology (IT) architecture in which client data is processed at the periphery of the network, as close to the originating source as possible. The move toward edge computing is driven by mobile computing, the decreasing cost of computer components and the sheer number of networked devices in the internet of things (IoT).

Analysts state that edge computing is a technology trend with a medium business impact which we will see surfacing in 2017.


Even though the business impact is generally seen as medium it provides a large technology wise benefit and understanding the concepts of edge computing can be of vital importance, especially when you are developing IoT solutions or geographically distributed systems that rely on machine 2 machine communication

The high-level concept
From a high-level point of view edge computing states that computations should be done as close as possible to the location where the data is generated. This implies that raw data should not be send to a central location for computations, the basic computations should rather be done at the collection point.

As an example, if you would do license plate recognition to decide if a security gate would open for a certain car you can architect this in a couple of ways. The “traditional” way of doing this is having a very light weight system which would take a number of pictures as soon as a car would trigger the camera. The pictures would then be send to a central server where a license plate recognition algorithm would extract the information from the picture and compare the result against a database to make a decision to open the security gate or not.

Architecting the same functionality with edge computing would involve a number different steps. In the edge computing model the car would trigger the camera and the pictures would be taken. A small computer, possibly embedded within the camera, would run the license plate recognition algorithm and only the result would be send to a REST API to check if the gate should be opened or should remain closed.

The benefit of edge computing in this case is that you would have a lot less data which needs to be communicated between the camera and the central server. Instead of sending a number of high resolution photos or even a video stream you only have to communicate a JSON object containing the license plate information.  By doing so you can limit the amount of computing power needed at the central location and at the same time improve the speed of the end user experience.

A more data intensive example
The example of the license plate recognition is a good illustration of the concept, a bigger scale and more data intensive example could be an example using smart devices.

Such an example could be industrial (or home use) equipment which relies on the data collected by a set of sensors to make decisions. If we would take an industrial example this could be a smart skid in factory responsible for ensuring that a number of liquid storage containers are always filled to a certain extend and are always at a certain temperature and mix.


Such a skid as described above involves a large set of sensors as well as a large set potential actions on valves, pumps and heating equipment. Traditionally this was done based upon an industrial PLC in a disconnected manner where it was not possible to centrally monitor and manage the skid.

Certain architecture blueprints state that the sensor data should be collected more centrally to ensure a more centralized management and monitoring solution. The result of this is that all data is being send to a central location where computations are done on the received data. The resulting actions are being communicated back again. The result of this is that a lot of data needs to be communicated back and forward and a loss in communication can result is a preventive shutdown of a skid.

In an edge computing architecture, the skid would be equipped with a local computing power solution which would take care of all the computations that are in other cases done in a central location. All decision making would be done locally on the edge of the network  and a subset of data and a log of the actions being undertaken would be send to a central command and control server where the remote skid can be monitored and human intervention could be triggered.

In this model the loss of connectivity would not result in a preventive shutdown, the operations would be continuing for a much longer time given the operational parameters that the edge computer holds.

Oracle Cloud and MQTT
As already mentioned in the example of the remote skid, a way to communicate the data in a IoT fashion is using MQTT. MQTT stands for Message Queue Telemetry Transport, It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks. The design principles are to minimize network bandwidth and device resource requirements whilst also attempting to ensure reliability and some degree of assurance of delivery. These principles also turn out to make the protocol ideal of the emerging “machine-to-machine” (M2M) or “Internet of Things” world of connected devices, and for mobile applications where bandwidth and battery power are at a premium.

On this blog we already discussed MQTT in combination with Oracle Linux and the Mosquitto MQTT message broker.

To facilitate the growing amount of IoT devices and the principle of edge computing which is relying on MQTT communication Oracle has included MQTT in the Oracle Cloud. Oracle primarily positions MQTT in combination with the Oracle IOT Cloud service in the form of a MQTT bridge. The Oracle IoT Cloud Service MQTT Bridge is a software application that must be installed and configured on Oracle Java Cloud Service in order to enable devices or gateways to connect to Oracle IoT Cloud Service over the MQTT communication protocol.


Within the Oracle cloud you see the MQTT bridge as a solution to connect remote devices to the Oracle IoT cloud via the MQTT protocol. The MQTT bridge receives the MQTT traffic and "translates" it to HTTPs calls which communicate with the Oracle IoT cloud Service.

In conclusion
As already outlined in the above examples, processing a large part of the computations at the edge of the network and implementing the principles of edge computing will drastically reduce the amount of computing power and storage capacity you need in the Oracle Public Cloud. In many cases, you can rely on MQTT communication or HTTPS communication where you call a REST API.

By pushing a large part of the computations to the edge your remote systems and devices become more reliable, even in cases where network communication is not always a given, and the resulting services become faster. 

Sunday, June 04, 2017

Oracle Linux - Using Consul for DNS based service discovery

Consul, developed by hashicorp,  is a solution for service discovery and configuration. Consul is completely distributed, highly available, and scales to thousands of nodes and services across multiple datacenters. Some concrete problems Consul solves: finding the services applications need (database, queue, mail server, etc.), configuring services with key/value information such as enabling maintenance mode for a web application, and health checking services so that unhealthy services aren't used. These are just a handful of important problems Consul addresses.

When developing microservices it is important that as soon as new instance of a microservice comes online it is able to register itself to a central registry, this process we call service registration. As soon as an instance of a microservice is registered at the central registry it can be used in the load balancing mechanism. If a call to a service is to be initiated to a microservice the service needs to be discovered via the central registry, the process is called service discovery.

In effect two ways are common for service discovery. One is based upon an API discovery model where the calling service discovers the service by executing a call to the service registry based upon a HTTP REST API call and receiving an endpoint which can be used. Commonly an URL based upon a IP and a port number.

The other common way is using a DNS based lookup against a service registry. The effect of doing a DNS based lookup is that you need to ensure that all the instances of a service are always running on the same port on all instance. Enforcing the same port number might be somewhat limiting in cases where your port number for each service instance can vary.

Using Consul on Oracle Linux
In an earlier blogpost I already outlined how to install Consul on Oracle Linux. In this post I will provide a quick inisght in how you can configure it on Oracle Linux. We build upon the installation done in the mentioned post.

We take the example of a service where we have two instances for, the name of the service is web and in effect is nothing more than a simple nginx webserver running in a production instance. Every time we call the service we want to discover the call by using a DNS lookup, we do however want to have this balanced, meaning we want to have a different IP being returned from the DNS server.

Configure a service in consul
We configure the service "web" manually in this case by creating a JSON file in /etc/consul.d and ensure we have the information about the two instances in this JSON file. An example of the file is shown below

{
 "services": [{
   "id": "web0",
   "name": "web",
   "tags": ["production"],
   "Address": "191.168.1.10",
   "port": 80
  },
  {
   "id": "web1",
   "name": "web",
   "tags": ["production"],
   "Address": "191.168.1.11",
   "port": 80
  }
 ]
}

As you can see, we have one name, "web" with two ID's; "web0" and "web1". The id's are used to identify the different instances of the service web. as you can see they have both a port noted next to the address. Even though it is good practice to have this in the configuration file it will not be used in the response from the internal consul DNS service as DNS will only return the addresses and not the ports.

Discover the service via Consul DNS
If we want to discover the service we can have our code to undertake a lookup against the DNS server. If we have configured the underlying Oracle Linux instance to have the Consul server in your /etc/resolv.conf file it will happen almost automatically. It will be important to make sure the ordering of your DNS servers is done correctly to improve resolving speed.

In effect all services configured in Consul will be by default part of .service.consul which will mean that if we want to do a DNS resolving for the web service we will have to do a resolve for web.service.consul. In the below example I have consul running on my localhost at port 8600 and I use dig to explicitly force dig to resolve it at this DNS server. As stated, if you configure it correctly you do not have to explicitly call it and you should be able to do a DNS resolve as you always do.

[root@localhost consul.d]#
[root@localhost consul.d]# dig +noall +answer @127.0.0.1 -p 8600 web.service.consul 
web.service.consul. 0 IN A 191.168.1.10
web.service.consul. 0 IN A 191.168.1.11
[root@localhost consul.d]#

As you can see from the above example I will get the two addresses returned from the Consul DNS server.

Consul and load-balancing
As microservices are commonly build up out of a number of instances of the same service we do want to ensure that load-balancing is done. We can already see from the dig example above that there are two instances. However, having them always in the same order returned will not ensure that the load is balanced over the two instances.

Consul will by default do a load-balancing and will return the IP's in a different order by rotating them in the DNS response. In the below example you can see that this is done when we call the DNS server a couple of times.

[root@localhost consul.d]#
[root@localhost consul.d]# dig +noall +answer @127.0.0.1 -p 8600 web.service.consul 
web.service.consul. 0 IN A 191.168.1.10
web.service.consul. 0 IN A 191.168.1.11
[root@localhost consul.d]#
[root@localhost consul.d]# dig +noall +answer @127.0.0.1 -p 8600 web.service.consul 
web.service.consul. 0 IN A 191.168.1.11
web.service.consul. 0 IN A 191.168.1.10
[root@localhost consul.d]#

In conclusion
If you are running a microservices based IT footprint and you are using Oracle Linux you can read in the referenced article how to install Consul to do service discovery and registration. Consul supports both API based as well as DNS based discovery. If you have your service instances always on the same and pre-defined port using DNS is a very good option to use for your service discovery process.


Oracle Cloud - Data democratization by using REST API’s

The idea of helping everybody to access and understand data is known as data democratization. Data democratization means breaking down silos and providing access to data when and where it is needed at any given moment.  By striving to have full data democratization within the enterprise is actually taking the step to a data driven company and nurture data driven decision making.

The general idea of data democratization and providing access to everyone in the company to use it is a very simple idea, the realization of this idea is however a very complex one in many cases. This is especially true in organically grown companies who have, over time, grown their IT footprint. In general, this includes a large set of legacy applications who do not by nature support integration that well.

However, the fact that an enterprise has a large set of legacy applications should not hold back the ambition to change to a more data driven enterprise. Moving to a more data driven enterprise, democratization of data and base decisions on actual data is a huge benefit for enterprise. Additionally, it is the starting point of integrating other systems and drive business in new and disruptive ways to keep the advantage over competitors.

Getting started
To get started with data democratization the first step is to start finding your data and classify the data sources. The below pointers can be of importance when evaluating the data.
- Data location : where is the data located, how easily can it be accessed
- Data ownership : which department owns the data
- Data confidentiality : How confidential is this data
- Data privacy : is there privacy related data in the set
- Data value : what is the monetary value of the data
- Data alignment : how well aligned is the data with other sources
 
Taking the above questions into mind when classifying all data this will give you a route of action per dataset. I will help you to identify how to handle each data-source, how to classify it and to integrate it. It also helps you to prioritize it.

Moving to the cloud
When moving to a data democratization model, this might be a turning point in how you look at IT and it might be good moment to consider the use of cloud. When trying to integrate and store a large set of data you can select, as an example, the Oracle Cloud to house the data you make available for all your users.

This is not necessarily meaning that you have to move the actual systems to the Oracle Cloud. One can think of a model where the backend systems remain in your current datacenter or cloud and you move / sync your data and the changes to the Oracle Cloud where you unlock them to the users using REST API’s and portals in the form of a data shop.

Opening up with a data shop
The concept of a data shop is the way to get started with data democratization. A data shop is a self-service portal where users can gain access to all the data that you have liberated. It provides users the option to get access to REST API’s or to, as an example, the Oracle Data Visualization Cloud Service, which can show data already included in graphs and other visualization.

As with a real shop, a large number of “products” are available. Some are for the standard users in the form of pre-defined dashboards and reports and some users will require the data in a rawer format to make and share their om reports and analysis.

Making it easy
Making it easy for data consumers to use the data is actually two folded. You will have two types of consumers, the tech consumers and the non-tech consumers. The tech consumers will require REST API’s to gain access to the data and undertake all the actions they need and think are valuable. The other type of consumers are the non-tech users. For non-tech users the REST API approach might be too difficult to master and they will need a more simple way to gain access to the liberated data.

After you moved your data to the cloud , as a first step in the process you will have to ensure that the data is accessible, via REST API’s and also via standard dashboards. Oracle is providing a growing number of options in the Oracle Public Cloud to do both. You can use standard visualization and data exploration tooling to your users within the cloud which have a relative low learning curve and people can start with right away. An example of this is the Oracle Data Visualization Cloud Service.


Oracle is also providing API functionality, even though the services Oracle provide standard from within the database and with some of the cloud services it might very well be beneficial to consider building your own REST API implementation while leveraging both the Oracle Compute Cloud Service with Oracle Linux instances and the Oracle Container Cloud Service.

Putting it all together
With data democratization, you open up your data, break the silo way of architecture and provide your users the option to analyze the data and make use of a active and up-to-date collection of data in one single place, the data shop. Moving to the cloud and leveraging the cloud is a technical solution to make this happen. Moving to the cloud is not the goal for data democratization. 

Wednesday, May 24, 2017

Oracle Linux - capture context switching in Linux

Before we dive into the subject, context switching is normal, the Linux operating system needs context switching to function, no need to worry. The question is, if it is normal why would you like to monitor it? Because, as with everything, normal behavior is accepted however behavior which gets out of bounds will cause an issue. With context switching, you are ok in expecting a certain number of context switching at every given moment in time, however, when the number of context switches get out of hand this can result in slow execution of processes for the users of the system.

Definition of context switching
The definition to context switching given by the Linux Information Project is as follows: “A context switch (also sometimes referred to as a process switch or a task switch) is the switching of the CPU (central processing unit) from one process or thread to another. A process (also sometimes referred to as a task) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them).

A context switch comes with a cost, it takes and capacity to undertake the context switch. Meaning, if you can prevent a context switch this is good and will help in the overall performance of the systems. In effect, context switching comes in two different types, voluntary context switches and non-voluntary context switches.

Voluntary context switches
When running a process can decide to initiate a context switch, if the decision is made by the code itself we talk about a voluntary context switch (voluntary_ctxt_switches). This can be for example that you voluntarily give up your execution time by calling sched_yield or you can put a process to sleep while waiting for some event to happen.

Additionally, a voluntary context switch will happen when your computation completes prior to the allocated timeslice expires.

All acceptable when used in the right manner and when you are aware of the costs of a context switch.

non-voluntary context switches
Next to the voluntary context we have the non-voluntary context switches (nonvoluntary_ctxt_switches). A non-voluntary context switch happens when a process becomes unresponsive, however, it also happens when the task is not completed within the given timeslice. When the task is not completed in the given timeslice the state will be saved and a non-voluntary context switch happens.

Prevent context switching
When trying to develop high performance computing solutions you should try to, at least, be aware of context switching and take it into account. Even better try to minimize the number of voluntary context switches and try to find the cause of every non-voluntary context switch.
As context switching comes with a cost you want to minimize this as much as possible, and when a non-voluntary context switch happens the state needs to be saved and the task is placed back in the scheduler queue needing to wait again for a execution timeslice. This makes the overall performance of your system slow down and the specific code you have written becomes even more slow.

Check proc context switches
When working on Linux, we are using Oracle Linux in this example however this applies for most systems, you can check information on context switches by looking into the status which can be located at /proc/{PID}/status in the below example we check for the voluntarty and non-voluntary context switches of pid 25334.

[root@ce /]#
[root@ce /]# cat /proc/25334/status | grep _ctxt_
voluntary_ctxt_switches: 687
nonvoluntary_ctxt_switches: 208
[root@ce /]#

As you can see the number of voluntary context switches is (at this moment) 687 and the number of non-voluntary context switches is 208. This is a quick and dirty way of determining the number of context switches that a specific PID has had at a specific moment.

Monitor context switches
You can monitor your systems for context switching. Even though you are able to do so, you will need a good case to do it. Even though it provides information on your system in most cases and deployments there is no real need to monitor the number of context switches constantly. Having stated that, there are also a lot of cases where monitoring context switching can be vital for ensuring the health of your server and/or compare nodes in a wide cluster.

A quick and dirty way of monitoring your context switches is by taking a sample. For example you could take a sample of the average number of context switches for all processes on you Linux instance that execute a context switch in the sample timeframe.

The below example script takes a 10 second sample of the context switches and provide the output of only the relevant data for this we use the pidstat command which can be installed by installing the sysstat package which is available on the Oracle Linux YUM repository.

pidstat -w 2 1 | grep Average | grep -v pidstat | sort -n -k4 | awk '{ if ($2 != "PID") print "ctxt sample:" $2" - "  $3 " - " $4 " - "  $5}'

The full example in our case looks like the one below:

[root@ce tmp]# pidstat -w 2 1 | grep Average | grep -v pidstat | sort -n -k4 | awk '{ if ($2 != "PID") print "ctxt sample:" $2" - "  $3 " - " $4 " - "  $5}'
ctxt sample:12 - 0.50 - 0.00 - watchdog/0
ctxt sample:13 - 0.50 - 0.00 - watchdog/1
ctxt sample:15 - 0.50 - 0.00 - ksoftirqd/1
ctxt sample:18 - 3.00 - 0.00 - rcuos/1
ctxt sample:2183 - 1.00 - 0.00 - memcached
ctxt sample:2220 - 1.00 - 0.00 - httpd
ctxt sample:52 - 1.00 - 0.00 - kworker/1:1
ctxt sample:56 - 1.50 - 0.00 - kworker/0:2
ctxt sample:7 - 14.00 - 0.00 - rcu_sched
ctxt sample:9 - 11.50 - 0.00 - rcuos/0
[root@ce tmp]#

to understand the output we have to look at how pidstat normally provides the output. The below is an example of the standard pidstat output:

[root@ce tmp]# pidstat -w 2 1
Linux 4.1.12-61.1.28.el6uek.x86_64 (testbox7.int)  05/23/2017  _x86_64_ (2 CPU)

03:24:37 PM       PID   cswch/s nvcswch/s  Command
03:24:39 PM         3      0.50      0.00  ksoftirqd/0
03:24:39 PM         7     14.43      0.00  rcu_sched
03:24:39 PM         9      9.45      0.00  rcuos/0
03:24:39 PM        18      3.98      0.00  rcuos/1
03:24:39 PM        52      1.00      0.00  kworker/1:1
03:24:39 PM        56      1.49      0.00  kworker/0:2
03:24:39 PM      1557      0.50      0.50  pidstat
03:24:39 PM      2183      1.00      0.00  memcached
03:24:39 PM      2220      1.00      0.00  httpd

Average:          PID   cswch/s nvcswch/s  Command
Average:            3      0.50      0.00  ksoftirqd/0
Average:            7     14.43      0.00  rcu_sched
Average:            9      9.45      0.00  rcuos/0
Average:           18      3.98      0.00  rcuos/1
Average:           52      1.00      0.00  kworker/1:1
Average:           56      1.49      0.00  kworker/0:2
Average:         1557      0.50      0.50  pidstat
Average:         2183      1.00      0.00  memcached
Average:         2220      1.00      0.00  httpd
[root@ce tmp]#

As you can see from the “script” we print $2, $3, $4 and $5 for all average data where S2 is not “PID”. This gives us all the clear data. In our case the columns we show are the following:

$2 – the PID
$3 – number of voluntary context switches in the given sample time
$4 – number of non-voluntary context switches in the given sample time
$5 – the command name


How to use the monitor data
Collecting data, collecting sample data via monitoring is great, however when not used it is worthless and has to justify the costs of running the collector. As collecting the number of context switches has a cost you need to make sure you really need the data. A couple of ways you can use the data are described below and given a potential value in your maintenance and support effort.

Case 1 - Node comparison
This can be useful when you want to compare nodes in a wider cluster. Checking the number of context switches will be part of a wider set of checks and taking sample data. The number of context switches can be a good datapoint in the overall comparison of what is happening and what the difference between nodes is.

Case 2 - Version comparison
This can be a good solution in cases where you often have new version (builds / deployments) of code to your systems and want to track subtle changes in behavior of how the systems are working in a subtle manner.

Case 3 – Outlier detection
Outlier detection to detect subtle changes in the way the system is behaving over time. You can couple this to machine learning to detect changes over time. The number of context switches changing over time can be an indicator of a number of things and can be a pointer for a deeper investigation to tune your code.

Case 4 – (auto) scaling
Detecting the number of context switches, in combination with other datapoints can be input for scaling the number of nodes up and down. This in general is coupled with CPU usage, transaction timing and others. Adding context switching as an additional datapoint can be very valuable.

The site reliability engineering way
When applying the above you can adopt this in your SRE (site reliability engineering) strategy as one of the inputs to monitor your systems, automatically detect trends and prevent potential issues and feedback to developers on certain behaviour of the code in real production deployments.

Tuesday, May 09, 2017

Oracle Linux - Installing dtrace

When checking the description of dtrace for Oracle Linux on the Oracle website we can read the following: "DTrace is a comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.  Originally developed for Oracle Solaris and later ported to Oracle Linux, it allows administrators, integrators and developers to dynamically and safely observe live systems for performance issues in both applications and the operating system itself.  DTrace allows you to explore your system to understand how it works, track down problems across many layers of software, and locate the cause of any aberrant behavior.  DTrace gives the operational insights that have long been missing in the data center, such as memory consumption, CPU time or what specific function calls are being made."

Which sounds great, and to be honest using dtrace helps enormously in finding and debugging issues on your Oracle Linux system in cases you need to go one level deeper than you would normally go to find an issue.

Downloading dtrace
If you want to install dtrace one way you can do this is by downloading the files from the oracle website, you can find the two RPM's at this location.

Installing dtrace
when installing dtrace you might run into some dependency issues that are not that obvious to resolve. Firstly they have a dependency on each other. This means you will have to install the RPM files in the right order. You can see this below;

[root@ce vagrant]# rpm -ivh dtrace-utils-0.5.1-3.el6.x86_64.rpm 
error: Failed dependencies:
 cpp is needed by dtrace-utils-0.5.1-3.el6.x86_64
 dtrace-modules-shared-headers is needed by dtrace-utils-0.5.1-3.el6.x86_64
 libdtrace-ctf is needed by dtrace-utils-0.5.1-3.el6.x86_64
 libdtrace-ctf.so.1()(64bit) is needed by dtrace-utils-0.5.1-3.el6.x86_64
 libdtrace-ctf.so.1(LIBDTRACE_CTF_1.0)(64bit) is needed by dtrace-utils-0.5.1-3.el6.x86_64
[root@ce vagrant]# rpm -ivh dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm 
error: Failed dependencies:
 dtrace-modules-shared-headers is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 dtrace-utils(x86-64) = 0.5.1-3.el6 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 libdtrace-ctf-devel > 0.4.0 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 libdtrace-ctf.so.1()(64bit) is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 libdtrace.so.0()(64bit) is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 libdtrace.so.0(LIBDTRACE_PRIVATE)(64bit) is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
[root@ce vagrant]# 

As you can see, you also have a number of other dependencies. The most easy way to resolve this is to simply use YUM to install both RPM's from your local machine and leverage the power of YUM to install the rest of the dependencies. For this we will use the yum localinstall dtrace-utils-* command.

Now we can quickly check if dtrace is indeed installed by executing the dtrace command without any specific option. You should see the below on your terminal:

[root@ce vagrant]# dtrace
Usage: dtrace [-32|-64] [-aACeFGhHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]
 [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
 [-x opt[=val]] [-X a|c|s|t]

 [-P provider [[ predicate ] action ]]
 [-m [ provider: ] module [[ predicate ] action ]]
 [-f [[ provider: ] module: ] func [[ predicate ] action ]]
 [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
 [-i probe-id [[ predicate ] action ]] [ args ... ]

 predicate -> '/' D-expression '/'
    action -> '{' D-statements '}'

 -32 generate 32-bit D programs and ELF files
 -64 generate 64-bit D programs and ELF files

 -a  claim anonymous tracing state
 -A  generate driver.conf(4) directives for anonymous tracing
 -b  set trace buffer size
 -c  run specified command and exit upon its completion
 -C  run cpp(1) preprocessor on script files
 -D  define symbol when invoking preprocessor
 -e  exit after compiling request but prior to enabling probes
 -f  enable or list probes matching the specified function name
 -F  coalesce trace output by function
 -G  generate an ELF file containing embedded dtrace program
 -h  generate a header file with definitions for static probes
 -H  print included files when invoking preprocessor
 -i  enable or list probes matching the specified probe id
 -I  add include directory to preprocessor search path
 -l  list probes matching specified criteria
 -L  add library directory to library search path
 -m  enable or list probes matching the specified module name
 -n  enable or list probes matching the specified probe name
 -o  set output file
 -p  grab specified process-ID and cache its symbol tables
 -P  enable or list probes matching the specified provider name
 -q  set quiet mode (only output explicitly traced data)
 -s  enable or list probes according to the specified D script
 -S  print D compiler intermediate code
 -U  undefine symbol when invoking preprocessor
 -v  set verbose mode (report stability attributes, arguments)
 -V  report DTrace API version
 -w  permit destructive actions
 -x  enable or modify compiler and tracing options
 -X  specify ISO C conformance settings for preprocessor
 -Z  permit probe descriptions that match zero probes
[root@ce vagrant]# 

All ready and set to start with dtrace on your Oracle Linux instance. As an addition, you will also have to install the below mentioned packages for your specific machine:

yum install dtrace-modules-`uname -r`

Oracle Linux - using pstree to find processes

When checking which processes are running on your Oracle Linux instance you can use the ps command. Most likely the ps command is the most likely the most used command to find processes, and for good reasons as it is very easy to use. However, when you want some more insight and a more easy view what is related to what the pstree option can be very useable.

pstree shows running processes as a tree. The tree is rooted at either pid or init if pid is omitted. If a user name is specified, all process trees rooted at processes owned by that user are shown. pstree visually merges identical branches by putting them in square brackets and prefixing them with the repetition count. As an example you can see the below standard output of pstree without any additional options specifief;

[root@ce tmp]#
[root@ce tmp]#  pstree
init─┬─VBoxService───7*[{VBoxService}]
     ├─acpid
     ├─crond
     ├─dhclient
     ├─httpd───10*[httpd]
     ├─java───26*[{java}]
     ├─java───47*[{java}]
     ├─java───35*[{java}]
     ├─java───29*[{java}]
     ├─java───46*[{java}]
     ├─memcached───5*[{memcached}]
     ├─6*[mingetty]
     ├─ntpd
     ├─2*[rsyslogd───3*[{rsyslogd}]]
     ├─2*[sendmail]
     ├─slapd───5*[{slapd}]
     ├─sshd───sshd───sshd───bash───sudo───su───bash───pstree
     └─udevd───2*[udevd]
[root@ce tmp]#
[root@ce tmp]# 

As you can see in the above example httpd is between brackets and 10 is mentioned. Which means that 10 which indicates that more processes are running as httpd. Below is shown a part of the full tree (removed the lower part for readability):

[root@ce tmp]# pstree -p
init(1)─┬─VBoxService(1182)─┬─{VBoxService}(1185)
        │                   ├─{VBoxService}(1186)
        │                   ├─{VBoxService}(1187)
        │                   ├─{VBoxService}(1188)
        │                   ├─{VBoxService}(1189)
        │                   ├─{VBoxService}(1190)
        │                   └─{VBoxService}(1191)
        ├─acpid(1130)
        ├─crond(1275)
        ├─dhclient(984)
        ├─httpd(3612)─┬─httpd(3616)
        │             ├─httpd(3617)
        │             ├─httpd(3618)
        │             ├─httpd(3619)
        │             ├─httpd(3620)
        │             ├─httpd(3621)
        │             ├─httpd(3622)
        │             ├─httpd(3623)
        │             ├─httpd(5020)
        │             └─httpd(5120)
        ├─java(3993)─┬─{java}(3996)
        │            ├─{java}(3997)
        │            ├─{java}(3998)

This shows the main process (PID 3612) and all other processes that are forked from this process. Using pstree is primarily (in my opinion) to support you when doing some investigation on a machine and is not by default the best tool to use when scripting solutions on Oracle Linux. Having stated that, it is a great tool to use.

Monday, May 08, 2017

Oracle Linux - get your external IP in bash with ifconfig.me

Developing code that helps you in automatic deployments can be a big timesaver. Repeating tasks for installing servers, configuring them and deploying code on them is something which is more and more adopted by enterprises as part of DevOps and continuous integration and continuous deployment methods. When you use scripting for automatic deployment of your code in your own datacenter the beauty is that you fairly well know how the infrastructure looks and you have a fairly good view on how, for example, your machine will be accessible from the outside world. For example, if you deploy a server that has an external IP address on the outside of the network edge you should be able to determine this relatively easy even in cases where this IP is not the IP of your actual machine.

If you however provide scripting which you distribute you will not be able to apply the logic you might apply in your own network. For this you need some way to find out the external IP address. And, as stated, this can be something totally different than the IP which the machine actually has from your local operating system point of view.

The people at ifconfig.me have done some great work by providing a quick service to resolve this problem. ifconfig.me provide a service that will provide you all the information need (and more) in a manner that is easily included in bash scripting.

As an example, in case you would need your external IP to use in a configuration in your Oracle Linux deployment you could execute the below command:

[root@ce tmp]# curl ifconfig.me
172.217.17.46
[root@ce tmp]#

(do note, this is not my IP as I do not use a google webhost as one of my test machines). As you can see this is relative easy and to provide an example of how you could include this in a bash script you can review the below code snippet:

#!/bin/bash

 myIp=`curl -s ifconfig.me`

 echo $myIp

And, even though this is a very quick and easy solution to a problem you could face when you try to automate a number of steps while scripting the ifconfig.me provides more options. A number of options to get information from the "external" view are available and can all be found at the ifconfig.me website. However most important one is the ability to do a curl to ifconfig.me/all.json which will return a JSON based response with all the information in it. This makes it parsable. And to make it more easy, Oracle has included jq in the YUM repository which makes parsing JSON even more easy. An example of the JSON response from ifconfig.me is shown below (again.... using a fake google webhost and not my own private information.

{
 "connection": "",
 "ip_addr": "172.217.17.46",
 "lang": "",
 "remote_host": "ams16s29-in-f46.1e100.net",
 "user_agent": "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2",
 "charset": "",
 "port": "54944",
 "via": "",
 "forwarded": "",
 "mime": "*/*",
 "keep_alive": "",
 "encoding": ""
}

Thursday, May 04, 2017

Oracle Linux - find files after yum installation

Installing software on Oracle Linux is relative easy using the yum command. The downside of the ease of installing is that you do have a lot of files being "dumped" on the filesystem without keeping clear track of what is exactly installed where. Keeping your filesystem clean and understanding what ends up in which location is vital for maintaining a good working Linux instance.

A couple of options are available to keep track of what is installed where and provide you with a list of files which shows where things ended up on the filesystem.

First option is making use of the repoquery utility. Where repoquery is part of the yum-utils package you will have to ensure that you have installed repo-utils. Basically you can check this by checking if you have the repoquery utility (which is a good hint) or you can check by using the yum command as shown below:

[root@ce ~]# yum list installed yum-utils
Loaded plugins: security
Installed Packages
yum-utils.noarch                                                                                    1.1.30-40.0.1.el6                                                                                    @public_ol6_latest
[root@ce ~]#

If you have the repoquery utility you an use it to find out which files are installed into which location. An example of this is shown below where we check what is installed and where it is installed when we did the installation of yum-util 

[root@ce ~]# 
[root@ce ~]# repoquery --installed -l yum-utils
/etc/bash_completion.d
/etc/bash_completion.d/yum-utils.bash
/usr/bin/debuginfo-install
/usr/bin/find-repos-of-install
/usr/bin/needs-restarting
/usr/bin/package-cleanup
/usr/bin/repo-graph
/usr/bin/repo-rss
/usr/bin/repoclosure
/usr/bin/repodiff
/usr/bin/repomanage
/usr/bin/repoquery
/usr/bin/reposync
/usr/bin/repotrack
/usr/bin/show-changed-rco
/usr/bin/show-installed
/usr/bin/verifytree
/usr/bin/yum-builddep
/usr/bin/yum-config-manager
/usr/bin/yum-debug-dump
/usr/bin/yum-debug-restore
/usr/bin/yum-groups-manager
/usr/bin/yumdownloader
/usr/lib/python2.6/site-packages/yumutils
/usr/lib/python2.6/site-packages/yumutils/__init__.py
/usr/lib/python2.6/site-packages/yumutils/__init__.pyc
/usr/lib/python2.6/site-packages/yumutils/__init__.pyo
/usr/lib/python2.6/site-packages/yumutils/i18n.py
/usr/lib/python2.6/site-packages/yumutils/i18n.pyc
/usr/lib/python2.6/site-packages/yumutils/i18n.pyo
/usr/sbin/yum-complete-transaction
/usr/sbin/yumdb
/usr/share/doc/yum-utils-1.1.30
/usr/share/doc/yum-utils-1.1.30/COPYING
/usr/share/doc/yum-utils-1.1.30/README
/usr/share/doc/yum-utils-1.1.30/yum-util-cli-template
/usr/share/locale/da/LC_MESSAGES/yum-utils.mo
/usr/share/man/man1/debuginfo-install.1.gz
/usr/share/man/man1/find-repos-of-install.1.gz
/usr/share/man/man1/needs-restarting.1.gz
/usr/share/man/man1/package-cleanup.1.gz
/usr/share/man/man1/repo-graph.1.gz
/usr/share/man/man1/repo-rss.1.gz
/usr/share/man/man1/repoclosure.1.gz
/usr/share/man/man1/repodiff.1.gz
/usr/share/man/man1/repomanage.1.gz
/usr/share/man/man1/repoquery.1.gz
/usr/share/man/man1/reposync.1.gz
/usr/share/man/man1/repotrack.1.gz
/usr/share/man/man1/show-changed-rco.1.gz
/usr/share/man/man1/show-installed.1.gz
/usr/share/man/man1/verifytree.1.gz
/usr/share/man/man1/yum-builddep.1.gz
/usr/share/man/man1/yum-config-manager.1.gz
/usr/share/man/man1/yum-debug-dump.1.gz
/usr/share/man/man1/yum-debug-restore.1.gz
/usr/share/man/man1/yum-groups-manager.1.gz
/usr/share/man/man1/yum-utils.1.gz
/usr/share/man/man1/yumdownloader.1.gz
/usr/share/man/man8/yum-complete-transaction.8.gz
/usr/share/man/man8/yumdb.8.gz
[root@ce ~]# 
[root@ce ~]# 

This will help you to keep track of what is installed in which location and can support in ensuring you have a clean system. 

Sunday, April 30, 2017

Oracle Linux - Short Tip 7 - find inode number of file

Everything Linux is storing as a file is done based upon a model that makes use of inodes. The inode is a data structure in a Unix-style file system that describes a filesystem object such as a file or a directory. Each inode stores the attributes and disk block location(s) of the object's data. Filesystem object attributes may include metadata (times of last change, access, modification), as well as owner and permission data. In some cases it can be very convenient to know what the inode ID is for a specific file. you can find the inode number by using the ls command or the stat command as an example.

below you can see the ls command where we extend the ls -l with i to esure we have the inode information we need.

[vagrant@ce log]$ ls -li
total 128
1835019 -rw-r--r--  1 root root   1694 Apr 19 12:04 boot.log
1835122 -rw-------  1 root utmp      0 Apr 19 13:10 btmp
1835323 -rw-------. 1 root utmp      0 Mar 28 10:28 btmp-20170419
1835124 -rw-------  1 root root      0 Apr 28 18:21 cron
1835030 -rw-------  1 root root    250 Apr 19 12:04 cron-20170419
1835108 -rw-------  1 root root      0 Apr 19 13:10 cron-20170428
1835015 -rw-r--r--  1 root root  27726 Apr 19 12:04 dmesg
1835022 -rw-r--r--. 1 root root      0 Mar 28 10:28 dmesg.old
1837835 -rw-r--r--. 1 root root      0 Mar 28 10:28 dracut.log
1835316 -rw-r--r--. 1 root root 146292 Apr 30 12:56 lastlog
1970601 drwxr-xr-x. 2 root root   4096 Mar 28 10:28 mail
1835125 -rw-------  1 root root      0 Apr 28 18:21 maillog
1837833 -rw-------. 1 root root    181 Apr 19 12:04 maillog-20170419
1835118 -rw-------  1 root root      0 Apr 19 13:10 maillog-20170428
1835126 -rw-------  1 root root    789 Apr 30 12:54 messages
1837831 -rw-------. 1 root root  38625 Apr 19 13:10 messages-20170419
1835119 -rw-------  1 root root   5362 Apr 28 18:17 messages-20170428
1837825 drwxr-xr-x. 2 ntp  ntp    4096 Feb  6 05:58 ntpstats
1835130 -rw-------  1 root root      0 Apr 28 18:21 secure
1837832 -rw-------. 1 root root   6740 Apr 19 12:20 secure-20170419
1835120 -rw-------  1 root root      0 Apr 19 13:10 secure-20170428
1835131 -rw-------  1 root root      0 Apr 28 18:21 spooler
1835031 -rw-------  1 root root      0 Apr 19 12:04 spooler-20170419
1835121 -rw-------  1 root root      0 Apr 19 13:10 spooler-20170428
1835302 -rw-------. 1 root root      0 Mar 28 10:28 tallylog
1835128 -rw-r--r--. 1 root root      0 Mar 28 10:28 vboxadd-install.log
1835129 -rw-r--r--. 1 root root     73 Apr 19 12:04 vboxadd-install-x11.log
1835057 -rw-r--r--. 1 root root      0 Mar 28 10:28 VBoxGuestAdditions.log
1835321 -rw-rw-r--. 1 root utmp   6912 Apr 30 12:56 wtmp
1835028 -rw-------. 1 root root     64 Apr 19 12:13 yum.log
[vagrant@ce log]$

Another example of how to get the inode number is by using the stat command. The below example shows how we use stat on the boot.log file in Oracle Linux to get the inode number and other information.

[vagrant@ce log]$ stat /var/log/boot.log 
  File: `/var/log/boot.log'
  Size: 1694       Blocks: 8          IO Block: 4096   regular file
Device: fb01h/64257d Inode: 1835019     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-04-19 12:04:01.517000000 +0000
Modify: 2017-04-19 12:04:05.524262651 +0000
Change: 2017-04-19 12:04:05.524262651 +0000
[vagrant@ce log]$

Friday, April 14, 2017

Oracle Linux - Install maven with yum

When developing Java in combination with Maven on Oracle Linux you most likely want to install Maven with a single YUM command. The issue you will be facing is that Oracle is not by default providing maven in the YUM repository for Oracle Linux. The escape for this is to make use of the Fedora YUM repository. This means that you have to ensure that you add a Fedora repository to your list of YUM repositories.

As soon as you have done so you can make use of a standard YUM command to take care of the installation.

The below steps showcase how you can add the yum repository from repos.fedorapeople.org and after that you can execute the yum install command

wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo

yum install -y apache-maven

This should result in the installation of Maven on your Oracle Linux instance and should enable to you to start developing on Oracle Linux with Maven. To check if the installation has gone correctly you can execute the below command which will show you the information on the version of Maven.

[root@localhost tmp]#
[root@localhost tmp]# mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)
Maven home: /usr/share/apache-maven
Java version: 1.8.0_121, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_121/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.1.12-61.1.28.el6uek.x86_64", arch: "amd64", family: "unix"
[root@localhost tmp]#
[root@localhost tmp]#

Oracle Linux - Download Java JDK with wget

When working with Oracle Linux it might very well be that you do not have a graphical user interface. In case you need to download something most commonly you will be using wget or curl. In most cases that works perfecty fine, in some cases however this is not working as you might expect. One of the issues a lot of people complain about is that they want to download the java JRE or Java JDK from the Oracle website using wget or curl. When executing a standard wget command however they run into the issue that the response they get is not the rpm (or package) they want. Instead the content of the file is html.

Reason or this is that the page that controls the download works with cookies that force you to accept the license agreement prior to downloading. As you most likely will not have a graphical user interface you cannot use a browser to download it.

This holds that we need to trick the Oracle website in believing we have clicked the button to agree with the license agreement and serve the file instead of html code. The command you can use for this is shown below:

wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/jdk-8u121-linux-x64.rpm

We now instructed wget to send a header which states "Cookie: oraclelicense=accept-securebackup-cookie". This is the response needed by Oracle to serve the rpm file itself.

Now you can install the JDK by making use of the rpm -ivh command. This should ensure you have it installed on your Oracle Linux system.As a check you can execute the below version which should tell you exactly what is now installed on the system:

[root@localhost tmp]#
[root@localhost tmp]# java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[root@localhost tmp]#
[root@localhost tmp]#

Monday, April 10, 2017

Oracle Linux - use Vagrant to run Oracle Linux 6 and 7

Vagrant is an open-source software product build by HashiCorp for building and maintaining portable virtual development environments. The core idea behind its creation lies in the fact that the environment maintenance becomes increasingly difficult in a large project with multiple technical stacks. Vagrant manages all the necessary configurations for the developers in order to avoid the unnecessary maintenance and setup time, and increases development productivity. Vagrant is written in the Ruby language, but its ecosystem supports development in almost all major languages.

As part of the Vagrant ecosystem people can create Vagrant boxes and share them with others. We already have seen companies package solutions in virtual images to be used to provide customers with showcases of a working end-to-end solution. Even though this is a great way of sharing standard images of an operating systems including packaged products vagrant is more ented to provide custom images, boxes, to be used by developers for example.

Oracle Linux is already available within the vagrant boxes shared by the community. You can search for boxes in Atlas from Hashicorp build upon Oracle Linux in combination with Oracle Virtualbox. Even though that is great news, it has now improved with Oracle also providing official vagrant boxes from the Oracle website.

At this moment you can download official Oracle Linux boxes from yum.oracle.com/boxes for Oracle Linux 7.3, 6.8 and 6.9.

As an example of how to get started the below commands show the process to get a running Oracle Linux 6.8 box on a macbook with vagrant already installed.

Step 1 Download the box and add it to vagrant
vagrant box add --name ol6 http://yum.oracle.com/boxes/oraclelinux/ol68/ol68.box

Step 2 Initialize the box (in a directory where you want it)
vagrant init

Step 3 start the vagrant box (virtualbox)
vagrant up

Step 4 Login to the virtual machine.
vagrant ssh

For those who like to see a short screencast, you can watch the below screencast of the process.


We have not shown the process of installing vagrant itself, you can find the instructions of how to install Vagrant on your local machine on the vagrant website. Having vagrant on your local machine will help you to speed up the rapid prototyping of new ideas without the need to install and re-install a virtual machine every time.

Having your own private vagrant boxes within an enterprise and provide them to your developers to enable them to work with virtual machines that match the machines you will deploy in your datacenter will speed up the process for developers and removes the time needed to install and re-install virtual machine. Making sure developers can focus on what they want to focus on, developing solutions and coding.

Monday, April 03, 2017

Oracle Linux - Install Neo4j

Neo4j is a graph database  a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation.

This contrasts with conventional relational databases, where links between data are stored in the data, and queries search for this data within the store and use the join concept to collect the related data. Graph databases, by design, allow simple and fast retrieval of complex hierarchical structures that are difficult to model in relational systems. Graph databases are similar to 1970s network model databases in that both represent general graphs, but network-model databases operate at a lower level of abstraction and lack easy traversal over a chain of edges.

When developing an solution which is heavily depending on the relationship between data points the choice for a graph database such as Neo4j is a good choice. Examples of such an application can be for example a solution where you need to gain insight in the relationships between historical events, the relationship between people and actions or the relationship between events in a complex system. The last might be an alternative way for logging in a distributed microservice architecture based solution.

Install Neo4j on Oracle Linux
For those who like to setup Neo4j and get started with it to explore the options it might give you company, the below short set of instructions shows how to setup Neo4j on Oracle Linux. For those who use RedHat, the instructions below will most probably also work on RedHat Linux. However, the installation is done and tested on Oracle Linux.

First thing we need to do is to ensure we are able to use yum for the installation of Neo4j on our system. Other ways of obtaining Ne04j are also available and can be used however the yum way of doing things is the most easy way and provides the quickest result. An word of caution, Neo4j currently states that the yum based installation is experimental, we have  however not found any issue while using yum.

To ensure we have the gpg key associated with the Ne04j yum repository we have to import it, shown below is an example of how you can download the key.

[root@oracle-65-x64 tmp]#
[root@oracle-65-x64 tmp]# wget http://debian.neo4j.org/neotechnology.gpg.key
--2017-04-02 13:14:07--  http://debian.neo4j.org/neotechnology.gpg.key
Resolving debian.neo4j.org... 52.0.233.188
Connecting to debian.neo4j.org|52.0.233.188|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4791 (4.7K) [application/octet-stream]
Saving to: “neotechnology.gpg.key”

100%[===========================================>] 4,791 --.-K/s   in 0.005s 

2017-04-02 13:14:08 (1.01 MB/s) - “neotechnology.gpg.key” saved [4791/4791]
[root@oracle-65-x64 tmp]#

As soon as we have obtained the key by downloading it from the Neo4j download location we can import the key by using the import option from the rpm command as shown below:

[root@oracle-65-x64 tmp]#
[root@oracle-65-x64 tmp]# rpm --import neotechnology.gpg.key
[root@oracle-65-x64 tmp]#

Having the key will help validate the packages we download from the Neo4j repository on our Oracle Linux machine to be installed. We do however have to ensure yum is able to locate the Neo4j repository. This is done by adding a repo file to the yum repository directory. Below is shown how the repository is added to the yum configuration, this command will create the file neo4j.repo in /etc/yum./repos.d where yum can locate it and use it include the repository as a valid repository to search packages.

cat <<EOF> /etc/yum.repos.d/neo4j.repo
[neo4j]
name=Neo4j Yum Repo
baseurl=http://yum.neo4j.org/stable
enabled=1
gpgcheck=1
EOF

Having both the key and the repository present on your system will enable you to use yum for the installation of Neo4j. This means you can now use a standard yum command to install Neo4j on Oracle Linux, an example of the command is shown below.

[root@oracle-65-x64 tmp]#
[root@oracle-65-x64 tmp]# yum install neo4j

This should ensure you have Neo4j installed on your Oracle Linux instance.

Configuring Neo4j
As soon as you have completed the installation a number of tasks needs to be executed to ensure you have a proper working Neo4j installation.

By default NEo4j will not allow external connections to be made. This means that you can only connect to Neo4j by using the 127.0.0.1 address for the localhost. Even though this might very well be enough for a development or local test environment this is not what you want when deploying a server. It will be required that the Neo4j instance is also accessible from the outside world. This requires a configuration change to the Neo4j configuration file. The standard location for the configuration file, when Neo4j is deployed on an Oracle Linux machine, is /etc/neo4j . In this location you will notice the neo4j.conf file which holds all the configuration data for the Neo4j instance.

By default the below lines is commented out. Ensure you uncomment the line, this should ensure that Neo4j will accept non-local connections:

dbms.connectors.default_listen_address=0.0.0.0

Additionally your want Neo4j to starts during boot. For this you will have to ensure Neo4j is registered as a servic and activate. You can do so by executing the below command:

[root@oracle-65-x64 tmp]#
[root@oracle-65-x64 tmp]# chkconfig neo4j on

Now NEo4j should be registered as a service that will start automatically when the machine boots. To check this you can check this by using the below command.

[root@oracle-65-x64 tmp]# chkconfig --list | grep neo4j
neo4j          0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@oracle-65-x64 tmp]#

This however is not stating your  Neo4j instance is running, you will have to start it the first time after installation manually. To check the status of Ne04j on Oracle Linux you can use the below command:

[root@oracle-65-x64 ~]# service neo4j status
neo4j is stopped
[root@oracle-65-x64 ~]#

To start it you can use the below command:

[root@oracle-65-x64 ~]# service neo4j start
[root@oracle-65-x64 ~]# service neo4j status
neo4j (pid  5643) is running...
[root@oracle-65-x64 ~]#

Now you should have a running Neo4j installation on your Oracle Linux instance which is ready to be used. You now also should be able to go to the web interface of Neo4j and start using it.



Neo4J in the Oracle Cloud
When running Neo4J in the Oracle cloud the main installation of the Neo4J is already described in the section above. A number of additional things need to be kept in consideration when deploying it within the Oracle Public Cloud.

When deploying Neo4j in the Oracle cloud you will deploy it using the Oracle Public Cloud Compute Cloud Service. In the Compute Cloud Service you will have the option to provision an Oracle Linux machine and using the above instructions you will have a running machine in the Oracle Cloud within a couple of minutes.

The main key pointers you need to consider are around how to setup your network security within the Oracle Cloud. This also ties into the overall design, who can access Neo4j, which ports should be open and which routes should be allowed.

The way Oracle Cloud works with networks, firewalls and zone configuration is a bit different from how it is represented in a standard environment. However, even though the Oracle Compute Cloud service uses some different terms and different ways of doing things it provides you with exactly the same building blocks as a traditional IT deployment to do proper zone configuration and shield your database and applications from unwanted visitors.

A general advice when deploying virtual machines in the Oracle Public Cloud is to plan ahead and ensure you have your entire network and network security model mapped out and configured prior to deploying your first machine.

For the rest, using the Oracle cloud for a Neo4j installation is exactly the same as you would do in your own datacenter, with the exception that you can make use of the flexibility and speed of the Oracle Cloud.

Thursday, March 23, 2017

Oracle Cloud - Architecture Blueprint - microservices transport protocol encryption

The default way microservices communicate with each other is based upon the http protocol. When one microservice needs to call another microservice it will initiate a service call based upon a HTTP request. The HTTP request can be all of the standard methods defined in the HTTP standard such as GET, POST and PUT. In effect this is a good mechanism and enables you to use all of the standards defined within HTTP. The main issue with HTTP is that it is clear text and by default will not have encryption enabled.

The reality one has to deal with is that the number of instances of microservices can be enormous and the possible connections can be enormous in a complex landscape. This also means that each possible path, each network connection can be a potentially be intercepted. Having no HTTPS SSL encryption implemented makes intercepting network traffic much more easy.



It is a best practice to ensure all of your connections are by default enabled, to do so it will be needed to make use of HTTPS instead of HTTP. Building your microservices deployment to only work with HTTPS and not with HTTP bring in a couple of additional challenges.

The challenge of scaling environments
In a microservices oriented deployment containers or virtual machines that provide instances of a microservice will be provisioned and de-provisioned in a matter of seconds. The issue that comes with this in relation to using HTTPS instead of HTTP is that you want to ensure that all HTTPS connections between the systems are baed upon valid certificates which are being created and controlled by a central certificate authority.

Even though it is a possibility that you have each service that is provisioned generate and sign its own certificate this is not advisable. using self signed certificates is considered in general as a not secure way of doing things. Most standard implementations of negotiating encryption between two parties do not see a self-signed certificate as a valid level of security. Even though you can force your code to accept a self-signed certificate and make it work you will be able to ensure encryption on the protocol level is negotiated and used, however, you will not be able to fully be assured that the other party is not a malicious node owned by an intruder.

To ensure that all instances can verify that the other instance they call is indeed a trusted party and to ensure that encryption is used in the manner it is intended you will have to make use of a certificate authority. A certificate authority is a central "bookkeeper" that will provide certificates to parties needing one and the certificate authority will provide the means to verify that a certificate that is offered during encryption negotiation is indeed a valid certificate and belongs to the instance that provides this certificate.

The main issue with using a certificate authority to provide signed certificates is that you will have to ensure that you have a certificate authority service in your landscape capable of generating and providing new certificates directly when this is needed.

In general, as we look at the common way certificates are signed and handed out, it is a tiresome process which might involve third parties and/or manual processing. Within a environment where signed certificates are needed directly and on the fly this is not a real option. This means, requesting signed certificates from the certificate authority needs to be direct and preferably based upon a REST API.

Certificate authority as a service
When designing your microservices deployment while making use of HTTPS and certificates signed by a certificate authority you will need to have the certificate authority as a service. The certificate authority as a service should enable services to request a new certificate when they are initialized. A slightly other alternative is that your orchestration tooling is requesting the certificate on behalf of the service that needs to be provisioned and provides the certificate during the provisioning phase.

In both cases you will need to have the option to request a new certificate, or request a certificate revocation when the service is terminated, via a REST API.

The below diagram shows on a high level the implementation of a certification authority as a service which enables (in this example) a service instance to request a signed certificate to be used to ensure the proper way of initiating HTTPS connections with assured protocol level encryption.


To ensure a decoupling between the microservices and the certificate authority we do not allow direct interaction between the microservice instances and the certificate authority. From a security point of view and a decoupling and compartmentalizing point of view this is a good practice and adds additional layers of security within the overall footprint.

When a new instance of a microservice is being initialized, this can be as a docker container in the Oracle Container Cloud Service or this can be as a virtual machine instance in the Oracle Compute Cloud Service, the initialization will request the certificate microservice for a new and signed certificate.

The certificate microservice will request a new certificate by calling the certificate authority server REST API on behalf of the initiating microservice. The answer provided back by the certificate authority is passed through by the certificate microservice towards the requesting party. In addition to, just being a proxy, it is good practice to ensure you certificate microservice will do a number of additional verification to see if the requesting party is authorized to request a certificate and to ensure the right level of auditing and logging is done to provide a audit trail.

Giving the CA a REST API
When exploring certificate authority implementations and solutions it will become apparent that they have been developed, in general, without the need for a REST API in mind. As the concept of the certificate authority is already in place long before microservice concepts came into play you will find that the integration options are not that well available.

An exception to this is the CFSSL, CloudFlare Secure Socket Layer, project on Github. The CFSSL project provides an opensource and free PKI toolkit which provides a full set of rich REST API's to undertake all required actions in a controlled manner.

As an example, the creation of a new certificate can be done by sending a JSON payload to the CFSSL REST API, the return message will consist out of a JSON file which contains the cryptographic materials needed to ensure the requesting party can enable HTTPS. Below you will notice the JSON payload you can send to the REST API. This is a specific request for a certificate for the ms001253 instance located in the Oracle Compute Cloud Service.

{
 "request": {
  "CN": "ms001253.compute-acme.oraclecloud.internal",
  "hosts": ["ms001253.compute-acme.oraclecloud.internal"],
  "key": {
   "algo": "rsa",
   "size": 2048
  },
  "names": [{
   "C": "NL",
   "ST": "North-Holland",
   "L": "Amsterdam",
   "O": "ACME Inc."
  }]
 }
}

As a result you will be given back a JSON payload containing all the required information. Due to the way CFSSL is build you will have the response almost instantly. The combiantion of having the option to request a certificate via a call to a REST API and getting the result back directly makes it very usable for cloud implementations where you scale the number of instances (VM's, containers,..) up or down all the time.

{
 "errors": [],
 "messages": [],
 "result": {
  "certificate": "-----BEGIN CERTIFICATE-----\nMIIDRzCCAjGgAwIBAg2 --SNIP-- 74m1d6\n-----END CERTIFICATE-----\n",
  "certificate_request": "-----BEGIN CERTIFICATE REQUEST-----\nMIj --SNIP-- BqMtkb\n-----END CERTIFICATE REQUEST-----\n",
  "private_key": "-----BEGIN EC PRIVATE KEY-----\nMHcCAQEEIJfVVIvN --SNIP-- hYYg==\n-----END EC PRIVATE KEY-----\n",
  "sums": {
   "certificate": {
    "md5": "E9308D1892F1B77E6721EA2F79C026BE",
    "sha-1": "4640E6DEC2C40B74F46C409C1D31928EE0073D25"
   },
   "certificate_request": {
    "md5": "AA924136405006E36CEE39FED9CBA5D7",
    "sha-1": "DF955A43DF669D38E07BF0479789D13881DC9024"
   }
  }
 },
 "success": true
}

The API endpoint for creating a new certificate will be /api/v1/cfssl/newcert however CFSSL provides a lot more API calls to undertake a number of actions. One of the reasons the implementation of the intermediate microservice is that it can ensure that clients cannot initiate some of those API calls whithout having the need to change the way CFSSL is build.

The below overview shows the main API endpoints that are provided by CFSSL. A full set of documentation on the endpoints can be found in the CFSSL documentation on Github.

  • /api/v1/cfssl/authsign
  • /api/v1/cfssl/bundle
  • /api/v1/cfssl/certinfo
  • /api/v1/cfssl/crl
  • /api/v1/cfssl/info
  • /api/v1/cfssl/init_ca
  • /api/v1/cfssl/newcert
  • /api/v1/cfssl/newkey
  • /api/v1/cfssl/revoke
  • /api/v1/cfssl/scan
  • /api/v1/cfssl/scaninfo
  • /api/v1/cfssl/sign


Certificate verification
One of the main reasons we stated one should ensure that you do not use self-signed certificates and why you should use certificates from a certificate authority is that you want to have the option of verification.

When conducting a verification of a certificate, checking if the certificate is indeed valid and by doing so getting an additional level of trust you will have to verify the certificate received from the other party with the certificate authority. This is done based upon OCSP or Online Certificate Status Protocol. A simple high level example of this is shown in the below diagram;

Within the high level diagram as shown above you can see that:

  • A service will request a certificate from the certificate microservice during the initialization phase
  • The certificate microservice requests a ceretificate on behalf at the certificate authority
  • The certificate authority sends the certificate back to the certificate microservice after which it is send to the requesting party
  • The requesting party uses the response to include the certificate in the configuration to allow HTTPS traffic


As soon as the instance is up and running it is eligible to receive requests from other services. As an example; if example service 0 would call example service 2 the first response during encryption negotiation would be that example service 2 sends back a certificate. If you have a OCSP responder in your network example service 1 can contact the OCSP responder check the validity of the certificate received from example service 2. If the response indicates that the certificate is valid one can assume that a secured connection can be made and the other party can be trusted

Conclusion
implementing and enforcing that only encrypted connections are used between services is a good practice and should be on the top of your list when desiging your microservices based solution. One should include this int he first stage and within the core of the architecture. Trying to implement a core security functionality at a later stage is commonly a cumbersome task.

Ensuring you have all the right tools and services in place to ensure you can easily scale up and down while using certificates is something that is vital to be successful.

Even though it might sounds relative easy to ensure https is used everywhere and in the right manner it will require effort to ensure it is done in the right way and it will become and asset and not a liability.

When done right it a ideal addition to a set of design decisions for ensuring a higher level of security in microservice based deployments.

Wednesday, March 22, 2017

Oracle Linux - Short Tip 6 - find memory usage per process

Everyone operating a Oracle Linux machine, or any other operating system for that matter, will at a certain point have to look at memory consumption. The first question when looking at memory consumption during a memory optimization project is the question, which process is using how much memory currently. Linux provides a wide range of tools and options to gain insight in all facets of system resource usage.

For those who "just" need to have a quick insight in the current memory consumption per process on Oracle Linux the below command can be extremely handy:

ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'

It will provide a quick overview of the current memory consumption in mb per process.

[root@devopsdemo ~]# ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
         0.00 Mb COMMAND
       524.63 Mb /usr/sbin/console-kit-daemon --no-daemon
       337.95 Mb automount --pid-file /var/run/autofs.pid
       216.54 Mb /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
         8.81 Mb hald
         8.46 Mb dbus-daemon --system
         8.36 Mb auditd
         2.14 Mb /sbin/udevd -d
         2.14 Mb /sbin/udevd -d
         1.38 Mb crond
         1.11 Mb /sbin/udevd -d
         1.04 Mb ps -eo size,pid,user,command --sort -size
         0.83 Mb sshd: root@pts/0
         0.74 Mb cupsd -C /etc/cups/cupsd.conf
         0.73 Mb qmgr -l -t fifo -u
         0.73 Mb login -- root
         0.65 Mb /usr/sbin/abrtd

The overview is extremely usefull when you need to quickly find the processes that consume the most memory or memory consuming processes which are not expected to use (this much) memory. 

Tuesday, March 21, 2017

Oracle Cloud - architecture blueprint - Central logging for microservices

When you engage in developing a microservices architecture based application landscape at one point in time the question about logging will become apparent. When starting to develop with microservices you will see that there are some differences with monolithic architectures that will drive you to rethink your logging strategy. Where we will have one central server, or a cluster of servers where the application is running within a monolithic architecture you will see in a microservices architecture you will have n nodes, containers, instances and services.

In a monolithic architecture you will see that most business flows run within a single server and end-to-end logging will be relative simple to implement and later to correlate and analyze. If we look at the below diagram you will see that a a call to the API gateway can result in calls to all available services as well as in the service registry. This also means that the end-to-end flow will be distributed over all different services and logging will for some parts also be done on each individual node and not in one central node (server) as it is the case in a monolithic application architecture.



When deploying microservices in, for example, the Oracle Public Cloud Container Cloud Service it will be a good practice to ensure that each individual docker container as well as the microservice push the logging to a central API which will receive the log files in a central location.

Implement central logging in the Container Cloud Service
The difference between the logging from the microservice  and the Docker container deployed in the Oracle Public Cloud Container Cloud Service is that the microservice will be sending specific logging of the service which is specific developed during the development of the service and which is being send to a central logging API. This can include technical logging as well as functional business flow logging which can be used for auditing.

In some applications the technical logging is specifically separated from the business logging. This to ensure that business information is not available to technical teams and can only be accessed by business users who need to undertake an audit.

Technical logging on container logging is more the lower level logging which is generated by docker and the daemon providing the needed services to enable to run the microservice.


The above diagram shows the implementation of an additional microservice for logging. This microservice will provide a REST API capable of receiving JSON based logging. This will ensure that all microservices will push the logging to this microservice.

When developing the mechanism which will push the log information, or audit information, to the logging microservice it is good to ensure that this is a forked logging implementation. More information on forked logging and how to implement this while preventing execution delay in high speed environments can be found in this blogpost where we illustrate this with a bash example.

Centralize logging with Oracle Management Cloud
Oracle provides, as part of the Public Cloud portfolio, the Oracle Management Cloud and as part of that it provides Log Analytics. When developing a strategy for centralized logging of your microservices you can have the receiving logging microservice push all logs to a central consolidation server in the Oracle Compute Cloud. You can have the Oracle Management Cloud Log Analytics service collect this and include this in the service provided by Oracle.

An example of this architecture is show on a high level in the below diagram.


The benefit of the Oracle Management Cloud is that it will provide an integrated solution which can be included withe other systems and services running in the Oracle Cloud, any other cloud or your traditional datacenter.


An example of the interface which is provided by default by the Oracle Management cloud is shown above. This framework can be used to collect logging and analyze it for both your docker containers, your microservices as well as other services deployed as part of the overall IT footprint.

The downside for some architects and developers is that you have to comply with a number of standards and methods defined in the solution by Oracle. The upside is that a large set of analysis tooling and intelligence is pre-defined and available outside of the box.

Centralize logging with the ELK stack
Another option to consolidate logging is making use of non-Oracle solutions. Splunk comes to mind, however, for this situation the ELK stack might be more appropriate. The ELKS stack consists out of ElasticSearch, Logstash and Kibana complimented with Elastic beats and the standard REST API's.

The ELK stack provides a lot more flexibility to developers and administrators however requires more understanding of how to work with ELK. The below image shows a high level representation of the ELK stack in combination with Beats.


As you can see in the above image there is a reservation for a {Future}beat. This is also the place where you can deploy your own developed Beat, you can also use this method to do a direct REST API call to logstash or directly to Elasticsearch. When developing a logging for microservices it might be advisable to directly store the log data into elasticsearch from within the code of the microservice. This might result in a deployment as shown below where the ELK stack components, including Kibana for reporting and visualization are deployed in the Oracle Compute Cloud Service.

This will result in a solution where all log data is consolidated in Elasticsearch and you can use Kibana for analysis and visualization. You can see a screenshot from Kibana below.


The upside in using the ELK stack is that you will have full freedom and possibly more ease in developing more direct influence in integration. The downside is, you will need to do more yourself and need a deeper knowledge of your end-to-end technology (not sure if that is a real bad thing).

Conclusion
when you start developing an architecture for microservices you will need to have a fresh look on how you will do logging. You will have to understand the needs of both your business as well as your DevOps teams. Implementing logging should be done in a centralized fashion to ensure you have a good insight in the end-to-end business flow as well as all technical components.

The platform you select for this will depend on a number of factors. Both solutions outlined in the above post show you some of the benefits and some of the downsides. Selecting the right solution will require some serious investigation. Ensuring you take the time to make this decision will pay back over time and should not be taken lightly.