Packagecloud logo

How does a maven repository work?

TL;DR

Similar to our APT Repository Internals and YUM Repository Internals posts, this post aims to illustrate the inner workings of a Maven repository. Read on if you have ever been curious as to how mvn compile figures out which dependencies to download and how to retrieve them in order to build your project.

 

Overview

In this post we’ll examine how dependencies are defined and resolved within your maven project, then we’ll dive into how maven repositories make these dependencies available for consumption.

 

What is a maven dependency?

A maven dependency is an artifact that your project (or Maven itself, in the case of Maven plugins) needs to have during the maven build lifecycle.

These are declared in the <dependencies/> section of your project’s pom.xml file like this:

<dependencies>
  <dependency>
    <groupId>io.packagecloud</groupId>
    <artifactId>client</artifactId>
    <version>3.0.0</version>
  </dependency>
</dependencies>

Maven Coordinates

Most dependency declarations consist of groupId, artifactId, and version fields. A group of these key/value pairs is referred to as the Maven Coordinates for a particular dependency and much like geographical coordinates, they allow you precisely specify a particular dependency in an absolute way.

 

How does maven locate and resolve dependencies?

Unlike other repository formats (APT, YUM Rubygems, there is no main index file that enumerates all possible artifacts available for that repository. Maven uses the coordinates values for a given dependency to construct a URL according to the maven repository layout.

Maven Repository Layout mapping

For primary artifacts (explained below) the URL template looks like:

/$groupId[0]/../${groupId[n]/$artifactId/$version/$artifactId-$version.$extension

 

Rules for $groupId

According to the specification, the rule is:

$groupId is a array of strings made by splitting the groupId’s on “.” into directories.

So for the groupId value of org.example.subdepartment, our $groupId array would be [org, example, subdepartment], which when translated into directories, becomes org/example/subdepartment.

 

Primary Artifacts

One of the core features of Maven is its ability to handle Transitive Dependencies. That is, to find and download the dependencies of your dependencies, and their dependencies also, recursively, until they are all satisfied.

Just how your own Maven project has a pom.xml file listing its main dependencies, those dependencies also have a remote pom file serving a similar purpose. Maven uses this file to figure out what other dependencies to download. When a coordinate does not contain a classifier, it is considered a primary artifact and is expected to have a pom available.

Let’s resolve the pom and jar for the given coordinates at the beginning of this post:

<dependency>
  <groupId>io.packagecloud</groupId>
  <artifactId>client</artifactId>
  <version>3.0.0</version>
</dependency>
pom

We turn the groupId of io.packagecloud into /io/packagecloud, then construct the rest of the URL with $artifactId and $versionId, like so:

/io/packagecloud/client/3.0.0/client-3.0.0.pom

 

jar

Similarly, for the extension of jar:

/io/packagecloud/client/3.0.0/client-3.0.0.jar

 

Secondary Artifacts

Secondary artifacts, or “attached artifacts”, are dependencies that you want maven to download that are ancillary to your project. Most often they are used to download the javadocs and/or sources for a particular dependency. However, unlike a primary artifact, a secondary artifact is not expected to have a remote pom and has thus never has any dependencies.

They can be specified in the <dependencies/> section just like primary artifacts:

<dependency>
  <groupId>io.packagecloud</groupId>
  <artifactId>client</artifactId>
  <version>3.0.0</version>
  <classifier>sources</classifier>
</dependency>

Or, you can download them using mvn install:install-file, like so:

$ mvn install:install-file -DgroupId=io.packagecloud   \
                           -DartifactId=client         \
                           -Dversion=3.0.0             \
                           -Dclassifier=sources        \

The URL template for secondary artifacts is just like the one for primary artifacts, but with an additional $classifier variable:

/$groupId[0]/../$groupId[n]/$artifactId/$version/$artifactId-$version-$classifier.$extension
javadoc
/io/packagecloud/client/3.0.0/client-3.0.0-javadoc.jar

 

sources
/io/packagecloud/client/3.0.0/client-3.0.0-sources.jar

 

Checksums

To verify the downloaded artifacts Maven computes the md5 and sha1 checksum for that artifact and compares it to the values found in the checksum files located at $ARTIFACT_URL.md5, or $ARTIFACT_URL.sha1, respectively.

NOTE: This is strictly meant as a way to quickly verify downloads, and it is NOT meant to be used for authentication or security purposes. This is also NOT a substitute for using HTTPS, as checksums can be trivially intercepted and modified along with the modified artifacts.

sha1

For example, the sha1 file for our jar artifact would be located at:

/io/packagecloud/client/3.0.0/client-3.0.0.jar.sha1

 

md5

Similarly, the md5 file for our pom artifact would be located at:

/io/packagecloud/client/3.0.0/client-3.0.0.pom.md5

 

Signed Artifacts

To absolutely ensure the authenticity of downloaded artifacts, you can configure Maven to download and validate the cryptographic signatures for the artifacts and checksums it downloads (if available).

$artifact.asc

The artifact is signed and deployed to a repository at the following URLs:

/io/packagecloud/client/3.0.0/client-3.0.0.jar
/io/packagecloud/client/3.0.0/client-3.0.0.jar.asc

 

$checksum.asc

The checksums for those artifacts are also signed and deployed at the following URLs:

/io/packagecloud/client/3.0.0/client-3.0.0.jar.md5
/io/packagecloud/client/3.0.0/client-3.0.0.jar.md5.asc

What is a maven repository?

A Maven repository is wherever these constructed artifact URLs live. Most of the time, this is a Web server with a /maven2 document root, but it can actually be any protocol Maven has a transport plugin for.

To make it easier for humans to discover artifacts, most Web based repositories will be configured to render virtual directory listings, for instance the Maven Central repository lets you browse the entire org.apache group this way: http://repo1.maven.org/maven2/org/apache/.

 

The local repository

Before Maven attempts to download a particular artifact from a remote repository it checks the local repository. This is usually located at $HOME/.m2/repository. The local repository follows the same standard repository layout as remote repositories.

 

Remote repositories

Remote repositories are defined in your project’s pom.xml file under the <repositories/> section. For example:

<repositories>
  <repository>
    <id>computology-packagecloud-test-packages</id>
    <url>https://packagecloud.io/computology/packagecloud-test-packages/maven2</url>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>

You’ll notice that besides a <url/> and <id/> attribute, there are two boolean attributes, <releases> and <snapshots>.

If you are on Maven 2.x, then this would be <repository/> and <snapshotRepository/>, respectively. Previously, <repository/> definitions were implicitly release repositories, and it was not possible to support both releases and snapshots.

Repository search order

As of Maven 3.x, repositories are searched in the order in which they are declared.

 

Release and SNAPSHOT repositories

As seen above, there are two features that can be enabled on repositories, even at the same time.

Release repositories

This is enabled by default on all defined repositories and it simply means that this repository should be added to the list of repositories to use for resolving “released” artifacts. These are artifacts that once published to a coordinate, must not be changed.

Because of the heavily cached and distributed nature of maven repositories (think of everyone's local repository and remote mirrors), you are strongly discouraged from deleting and republishing a changed artifact under the same coordinates. Unless every copy of the previous artifact can be purged from all repositories containing it, this make it difficult to ensure that everyone receives the same artifact given the same coordinates.

 

SNAPSHOT repositories

When a repository has the “snapshot” feature enabled, this means that Maven will add this to the list of repositories to use only when resolving SNAPSHOT versions of your dependencies.

What are SNAPSHOT versions?

Having to increase the version and permanently release your software every iteration can painfully lengthen your feedback cycles. Maven solves this problem with SNAPSHOT versions.

SNAPSHOT version dependencies look just like regular dependencies, except the version will have -SNAPSHOT appended to it. For example:

<dependency>
  <groupId>io.packagecloud</groupId>
  <artifactId>client</artifactId>
  <version>3.0.0-SNAPSHOT</version>
</dependency>

The idea is that you can continuously push your latest changes to 3.0.0-SNAPSHOT and anyone depending on it will get the latest changes every time they build their project. Then, after a few iterations, and everyone is happy the latest state of 3.0.0-SNAPSHOT, it can be permanently released as 3.0.0, and rapid development can continue on 3.0.1-SNAPSHOT.

 

maven-metadata.xml

In order to determine the the latest artifact to download for a particular SNAPSHOT version, Maven uses the Standard Repository Layout to locate a maven-metadata.xml file for that dependency. For example, using our SNAPSHOT dependency above, Maven constructs the following URL:

/io/packagecloud/client/3.0.0-SNAPSHOT/maven-metadata.xml

This file looks like this:

<metadata modelVersion="1.1.0">
  <groupId>io.packagecloud</groupId>
  <artifactId>client</artifactId>
  <version>3.0.0-SNAPSHOT</version>
  <versioning>
    <snapshot>
      <timestamp>20161003.234325</timestamp>
      <buildNumber>2</buildNumber>
    </snapshot>
    <lastUpdated>20161003234325</lastUpdated>
    <snapshotVersions>
      <snapshotVersion>
        <extension>jar</extension>
        <value>3.0.0-20161003.234325-2</value>
        <updated>20161003234325</updated>
      </snapshotVersion>
      <snapshotVersion>
        <extension>pom</extension>
        <value>3.0.0-20161003.234325-2</value>
        <updated>20161003234325</updated>
      </snapshotVersion>
    </snapshotVersions>
  </versioning>
</metadata>

According to version 1.1.0 of the Maven Repository Metadata Model(latest at time of writing), <snapshotVersion> contains the latest artifact corresponding to this snapshot version.

Using the <value> of that <snapshotVersion> as the $version in our URL construction scheme, we get the following URL for the jar extension:

/io/packagecloud/client/3.0.0-SNAPSHOT/client-3.0.0-20161003.234325-2.jar

Checksums and signatures work as expected:

/io/packagecloud/client/3.0.0-SNAPSHOT/client-3.0.0-20161003.234325-2.jar.asc
/io/packagecloud/client/3.0.0-SNAPSHOT/client-3.0.0-20161003.234325-2.jar.md5

As more snapshot artifacts are pushed to 3.0.0-SNAPSHOT, the maven-metadata.xml will always get updated to reflect the latest <snapshotVersion> to use.

 

Unique vs Non-Unique Snapshots

There are two snapshot “styles” that Maven can use.

Unique Snapshots

These are the snapshot versions detailed in the example above, they use a high resolution timestamp as a version and clients must a maven-metadata.xml file to resolve the latest. This is the only snapshot style supported by Maven 3.

 

Non-Unique Snapshots

Maven 2 allowed you to set a <uniqueVersion>false</uniqueVersion> on a repository definition. When this behavior is selected, there is no maven-metadata.xml file that is used and “-SNAPSHOT” versions are not treated any differently. The artifact is resolved just like any other. Thus, the URL for our example in a non-unique repository context would look like this:

/io/packagecloud/client/3.0.0-SNAPSHOT/client-3.0.0-SNAPSHOT.jar

This artifact URL simply gets overwritten every time there is a new version pushed up at those coordinates.

Due to the obvious issues this introduces, this style has been deprecated for a while now and completely unsupported in Maven 3.

 

Maven Central and the Super pom.xml

In addition to your project pom.xml, Maven uses a “Super” pom.xml to inherit some default configuration shared by all Maven installations. This is where the default repository, Maven Central is defined:

<repositories>
  <repository>
    <id>central</id>
    <name>Maven Repository Switchboard</name>
    <layout>default</layout>
    <url>http://repo1.maven.org/maven2</url>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>

That is why you can depend on artifacts hosted at Maven Central without having to define the repository.

 

Conclusion

Knowing how Maven constructs URLs and resolves dependencies can help you debug issues with your Maven repository. For more information, be sure to check out the official Maven documentation and Maven Source Code.

You might also like other posts...