In my previous posts, I’ve discussed how to retrieve the list of top repositories from GitHub,
get the list of packages each one uses, and store all that informationin a RavenDb database. Now, it’s time to query all the data.
RavenDb provides Map/Reduce indexes for querying the data. Let’s see some of the basic ones that interest us.
Repositories/ByName
from repo in docs.Repos
select new {
Name = repo.Name,
FullName = repo.FullName
}
Packages/ByName
from package in docs.Packages
select new {
Name = package.Name
}
Packages/ByTag
from package in docs.Packages
from tag in package.Tags
select new {
Tag = tag
}
These are all simple indexes, which are there mostly for searching and viewing our data. Much more interesting is to see the relations between repositories and packages. Let’s look at an index that allows us to get a list of repositories that use a certain package.Here’s the index definition:
Repositories/ByPackage
from repo in docs.Repos
from package in repo.Packages
from tag in package.Tags
select new {
PackageName = package.Name,
PackageVersion = package.Version
}
Querying this index gives us a nice list of repositories:
We can also create a more complex index, that will tell us which package (regardless of its versions) is the most popularly used:
MAP
from repo in docs.Repos
from packageRef in repo.Packages
let package = LoadDocument(packageRef.Id)
select new {
PackageName = packageRef.Name,
Repos = new string[] { repo.FullName },
Tags = package.Tags,
Count = 1
}
REDUCE
from r in results
group r by r.PackageName into g
select new {
PackageName = g.Key,
Repos = g.SelectMany(x = x.Repos).Distinct(),
Tags = g.SelectMany(x = x.Tags).Distinct(),
Count = g.Sum(x = x.Count)
}
FIELDS
- Tags:
- Storage: Yes
- Indexing: Analyze
- Count:
- Sort: Int
This index allows us to see which packages are the most widely used:
And even drill further into the data, and for example, see which of the packages that have the sql tag is most widely used (and by which repositories):
Here I’ve shown you how to query the data. In my next post, I’ll try to gain some insights from the data, and see if I can see some usage patterns.