Maven Repository Managers for the Enterprise
If you use Maven, or even if you just use Maven repositories for your dependency management, you should be using a Maven Repository Manager. It's like using a kayak without paddles: you'll get there eventually without them, but your life will be much easier if you are properly equipped.
In this article, we'll be looking at some of the things a repository manager can do for you. A correctly-configured repository manager can speed up your builds, save bandwidth, help you share artifacts within your organization, and give you better control as to what dependencies are used in your projects and where they are coming from. It can also play a key role in your development infrastructure, helping you set up a fully-blown automated build and deployment pipeline.
There are currently three main Maven repository managers on the market. Archiva is a light-weight repository manager in the Apache fold. Artifactory is a powerful and quite innovative product from JFrog. And Nexus is a powerful, flexible, and superbly documented repository manager, this time from Sonatype, the company behind Maven. All are free and open source, though both Nexus and Artifactory have professional versions with extra features aimed at enterprise requirements. Most of the examples in this article will be using Nexus and it's commercial cousin Nexus Pro, though many of the techniques and general approaches are also applicable to the other products.
So read on, and learn how a repository manager can simplify your life as a software developer! And if you are already using a repository manager, this article might give you some tips as to how to get the most out of your repository.
Why use a Repository Manager
Probably the most fundamental role of a repository manager is to act as a proxy/cache between you and the internet. One of the most important features of Maven is the notion of declarative dependency management. Indeed, in a Maven project, you don't store the JAR files your project needs in a
lib directory within your project - rather you list the dependencies you need directly in your build script. Ant users can also use declarative dependency management, either by using the Maven ant libraries or by using Ivy.
By default, Maven will attempt to download any dependencies it needs from public repositories on the internet, such as the Maven Central repository or the Codehaus repository. Any given jar file is only downloaded once, and cached on your workstation, no matter how many projects need it. And that's just fine if you are working alone. However, software development is a collaborative game, and most of us work in teams. So every developer will have to download the dependencies from the internet separately. In addition, Maven itself also downloads its own dependencies, so in a team of any size the quantity of downloaded files, and the time taken to download them, can mount up very quickly.
The first role of any Maven repository manager is to optimize this process (see Figure 1). It sits between your developer workstation and the internet repositories. Rather than going directly to the internet to download the required dependencies, Maven goes to the repository manager. Repository managers cache the files they download, so if the repository manager has already downloaded the dependency before, it can serve it out directly. If not, it goes to the internet to get it. In terms of overall build performance and consumed bandwidth, the gains are enormous.
Figure 1. A Repository Manager acts as a proxy server between you and the internet
Finer control on your downloads
But the benefits don't stop at economizing bandwidth. Once the repository manager becomes the central point of access for Maven dependencies within your organization, you can also use it to control where these dependencies come from, in much the same way as an internet proxy controls which web sites users can view. Indeed, many organizations like to have a clear visibility and control over what public repositories developers are using, and a repository manager can provide just that. Developers can no longer add arbitrary public repositories into their pom files - all dependencies must come from the repository manager, and all public repositories are configured and managed inside the repository manager. This also has the pleasant side effect of simplifying the repository configuration on the developer machines: you no longer need to maintain a long list of public repositories in each developer's
This is typically done using the notion of Groups (in Nexus) or Virtual Repositories (in Artifactory). Repositories are organized into groups, so that a single URL (such as 'http://myserver:8081/nexus/content/groups/public/') provides access to an arbitrary list of real repositories (see Figure 2). From the point of view of the Maven user, all dependencies come from this single URL. However, behind the scenes, you can associate (and manage) many physical repositories. This gives organizations much better control over which repositories they want to authorize.
Figure 2. Repository Managers hide several public and internal repository URLs behind a single point of access.
In some organizations, such as in Defense or in the medical sector, requirements are even stricter. Developers cannot add arbitrary new versions of Spring or Hibernate, for example: new libraries must first be approved by an Enterprise Architect or Security Specialist before they can be deployed to production. Some organizations even require vetting of new libraries before they can be used by development teams. Another common requirement is to be able to vet artifacts provided by external vendors before using them internally.
A repository manager can help to rationalize and organize this process. In organizations like this, developers may only be allowed to use a repository containing approved dependencies, for example. Using Artifactory, you could achieve this by copying or moving artifacts between repositories by hand. The Nexus Procurement Suite, available in Nexus Pro, provides more sophisticated support for several procurement scenarios, including both approving individual JAR files and verifying entire configurations before they go into production.
Figure 3. Using a Repository Manager to allow only selected JAR files to be used for development.
This centralized management also opens the way to other forms of build optimization. For example, Nexus also has the notion of Routes, a very flexible feature which lets you give Nexus hints about where certain artifacts can be found. The latest version of Artifactory has a similar concept in the form of Include/Exclude patterns for each remote proxy. In my own repository, for example, internal artifacts (in the 'com.wakaleo' domain) are deployed to the local Snapshots and Releases repositories. Likewise, no 'org.apache' artifacts are ever deployed to these repositories - it would be a waste of time to look for them there. So, as shown in Figure 4 I have configured Nexus to only look for the internal artifacts in the Snapshots and Releases repositories, and never to look for the Apache artifacts in these repositories. In a similar vein, Nexus also lets you set up mirrors for your repositories, so that requests to a particular repository will go to a closer or faster mirror instead. This sort of fine-tuning reduces the number of places Nexus needs to look for a given artifact, and speeds up performance accordingly.
Figure 4. Setting up Routes in Nexus to optimize downloads by giving it hints about where to find certain artifacts.
The Repository Manager as part of the build lifecycle
But that's not all a repository manager can do for you. The second fundamental function of a repository manager is to make it easier to share build artifacts (JAR files, WAR files, and so forth) within an organization. You do this by setting up hosted repositories, internal repositories that are designed to store your own internal JAR files as well as proprietary JAR files not available from the public repositories.
Internal artifacts are typically released and deployed following a well-defined build lifecycle. For example, while work is in progress on a particular version, releases will only be made available for team members. Once the product is ready for further testing, it might be released to a test or UAT (User Acceptance Testing) platform, and then to production. Depending on your infrastructure setup, your repository manager can play a key role in this process.
In Maven development, for instance, it is good practice to distinguish between SNAPSHOT versions, which are under active development and are not considered stable or finalized, and RELEASE versions, which are tested and officially made public with a unique version number. A common best practice which is well supported by all the repository managers involves deploying SNAPSHOT versions and RELEASE versions to different hosted repositories within your repository manager.
Security considerations are also important. Larger organizations may want to limit deployment rights for projects, so that only developers from a given project team are allowed to deploy snapshot or release artifacts for that project to the repository. One way to do this would be to set up a separate repository for each project, but this is not a very scalable solution. A better approach is to define rules defining who is allowed to deploy to different parts of the repository. Nexus in particular provides a particularly fine-grained security model that lets you specify rules like this using a combination of regular expressions and pre-defined 'privileges'. The Apache repository is a good example of this - all of the release artifacts are stored in a single hosted Nexus repository, but developers for each of the many hosted projects can only deploy artifacts for their own projects.
For larger organizations, it is often important to be able to work with an existing LDAP repository when configuring what users are allowed to do with the repository. Artifactory supports basic authentication LDAP. As of its latest version (1.4.2), Nexus comes with sophisticated LDAP support, including both authentication and support for more advanced features such as mapping LDAP groups to Nexus roles.
The repository manager can also play a more active role in the build lifecycle. Typically, once a version has been released, it needs to be tested and validated by QA before it can be deployed to production. You can do this by using different repositories for different build promotion stages. Smaller organizations can often get away with a simple architecture consisting of just a snapshots repository (for work in progress) and a release repository (for release candidates and official releases). However, larger organizations will usually need a more sophisticated build promotion strategy.
Figure 5 illustrates one such possible build promotion strategy. In this architecture, snapshot builds are automatically deployed to the Enterprise Snapshots repository, to be used by developers within the project team itself. Release Candidate builds are prepared using the Maven Release plugin, and deployed to the Release Candidates repository. Builds in this repository can be deployed to System and UAT (User Acceptance Testing) testing environments. Once a version has been approved in the UAT environment, it can be promoted and deployed to the official Enterprise Releases repository, from where it can be deployed to production. This staged approach ensures that only properly validated versions can be deployed to production environments.
Figure 5. An example of a build promotion architecture.
In practice, implementing this sort of build promotion architecture requires a little thought and often some manual scripting. Indeed, the build promotion process itself can be tricky. You can rebuild and deploy from the source code for each platform, or redeploy manually to another repository; some repository managers like Artifactory even let you manually copy or move artifacts between repositories, though duplicating the same artifact between repositories is not considered to be good practice.
Nexus Pro, the commercial version of Nexus, comes with the Staging Suite which provides more sophisticated support for the build promotion process. The Staging Suite works by intercepting certain deployed artifacts and placing them in a special staging repository, dynamically-created by Nexus. For example, you might set up a staging configuration to intercept all artifacts from the com.mycompany.killerapp project. Whenever a developer deploys an artifact for this project, Nexus creates a special staging repository and places the newly deployed artifacts here (see Figure 6).
Figure 6. Staged artifacts are deployed to a special repository created by Nexus.
Once all the artifacts are deployed, the administrator can lock down the staging repository and make it available for further testing (for example, via an automated deployment to a QA platform). One approach is to make the staging repository available in a special Release Candidate repository group, from which it can be deployed to the various UAT and other testing environments (see Figure 7). At this point, other interested users will be notified that the artifacts have been staged.
Figure 7. Adding the staged repository to the Release Candidate repository group
If the testing is conclusive, the administrator can promote the build and deploy it to it's final repository (see Figure 8). If not, the administrator simply drops the staging repository and the developer redeploys a fixed version later on.
Figure 8. Promoting an artifact in Nexus.
There is also a Maven plugin for the Nexus Staging plugin, allowing this process to be integrated into Hudson or another CI tool.
This whole process makes generating releases from Maven projects a relatively smooth and painless operation. It is also a good example of how a repository manager is not just a directory of JAR files, but can underpin your whole deployment lifecycle as well.
Keeping your repository clean
A repository manager can also help you keep your repository clean. For example, automated builds can take up a lot of disk space, especially if you are generating regular snapshot builds. A repository manager will help you avoid wasting space with excessive numbers of snapshots. Nexus, for example, will regularly delete older snapshots, with the option of deleting all the snapshots for a particular artifact once that artifact is released.
A Maven repository manager is much more than a file server. In addition to the obvious performance benefits, using a repository manager also gives you more control over what artifacts are being downloaded, and from where. More advanced features such as Procurement can give an even finer control over the dependencies being used in development and/or being deployed to production.
A repository manager also can and should play a key role in your build promotion strategy. For smaller organizations, a simple snapshot/release repository setup may be enough, whereas larger organizations will often want to use the repository manager to underpin the entire build promotion and staging process.