Collagens are a large family of triple helical proteins that are widespread throughout the body and are important for a broad range of functions, including tissue scaffolding, cell adhesion, cell migration, cancer, angiogenesis, tissue morphogenesis and tissue repair. Collagen is best known as the principal tensile element of vertebrate tissues such as tendon, cartilage, bone and skin, where it occurs in the extracellular matrix as elongated fibrils. Collagen is also well known for its location in basement membranes – for example, in the kidney glomerulus, where it functions in molecular filtration. However, the identification of transmembrane collagens on the surfaces of a wide variety of cells and collagens that are precursors of bioactive peptides that have paracrine functions has resulted in a revival of interest in collagen. Moreover, new developments in 3D reconstruction electron microscopy have led to new opportunities for studying intracellular trafficking of collagen. Newcomers to the field face the daunting task of sifting through 100,000 research papers that span 40 years. Here, we provide `the collagen basics'. Several excellent reviews are cited that are sources of more detailed descriptions and discussions.
Structure and composition
Collagens contain three polypeptide (α) chains, displaying an extended polyproline-II conformation, a right-handed supercoil and a one-residue stagger between adjacent chains (Brodsky and Persikov, 2005). Each polypeptide chain has a repeating Gly-X-Y triplet in which glycyl residues occupy every third position and the X and Y positions are frequently occupied by proline and 4-hydroxyproline, respectively. The three α chains are held together by interchain hydrogen bonds. Highly ordered hydration networks surround the triple helices. The significance of these interactions to collagen stability remains a matter of debate. Some collagens have interruptions (containing numerous residues) and imperfections (one to three residues) in the triple helix. The conformational changes derived from some simple imperfections have been visualised in crystal structures of model peptides (Bella et al., 2006). Integrin adhesion sequences [e.g. Arg-Gly-Asp (RGD) and Gly-Phe-O-Gly-Arg, where O is hydroxyproline] occur in the triple helical domain of several collagens and contribute to integrin-ligand-binding specificity (Bella and Humphries, 2005).
Nomenclature and classification
There is no agreed definition for a collagen; there are triple helical proteins that are called collagens and there are proteins that have triple helical domains that are not regarded as collagens. In general, collagens are regarded as triple helical proteins that have functions in tissue assembly or maintenance. Inevitably, the line between `collagens' and `collagen-like' proteins is blurred. Vertebrate collagens are given a Roman numeral.
Collagen I is the archetypal collagen in that it is trimeric, it is triple helical, its triple helix has no imperfections, it assembles into fibrils, and it has a predominately structural role in the tissue. However, most collagens differ from collagen I in one or more respects. For example, other collagens can have interruptions in the triple helix and do not necessarily assemble (in their own right) into fibrils. Furthermore, transmembrane collagens have numerous interruptions in the triple helix, do not self-assemble into fibrils, and have roles in cell adhesion and signaling.
At least 28 different collagens occur in vertebrates (numbered I-XXVIII; some with common names), together with a large group of collagen-like proteins (e.g. acetyl cholinesterase, adiponectin, C1q, ficolin, macrophage receptor and surfactant protein) (for a review, see Myllyharju and Kivirikko, 2001). In general, invertebrates have far fewer collagen genes but most have examples of fibrillar and basement membrane collagen (Huxley-Jones et al., 2007). Most collagens have evolved from orthologues present in invertebrates but little is known about the molecular composition of these molecules and no systematic nomenclature exists. Notably, ∼200 cuticle-forming collagens are found in Caenorhabditis elegans (Page and Winter, 2003).
Collagens can be heterotrimeric – for example, type I collagen, which contains two identical α chains and a third chain that differs, [α1(I)]2α2(I). However, the majority of collagens are homotrimers – for example, collagen II, which contains three identical α chains [α1(II)]3. Note that the α1 chain of one type of collagen (e.g. collagen I) has a primary structure different from that of the α1 chain of another type of collagen (e.g. collagen II). Collagens have non-triple helical domains at their N- and C-termini. These domains are called `non collagenous' (NC) domains and are numbered from the C-terminus (NC1, NC2, etc.).
Vertebrate collagens are classified by function and domain homology
Fibril-forming collagens occur as 67-nm D-periodic fibrils that are the principal source of tensile strength in animal tissues (Kadler et al., 1996). The fibrils are indeterminate in length and range in diameter from 12 nm to >500 nm, depending on the stage of development and tissue. The periodic structure of the fibril is due to regular staggering of triple helical collagen molecules. Mammals have 11 fibrillar collagen genes, which cluster phylogenetically into three distinct subclasses (Huxley-Jones et al., 2007). The Gly-X-Y domain of fibril-forming collagens contains ∼1000 residues and is uninterrupted, with the notable exception of collagen XXIV and collagen XXVII.
Fibril-forming collagens are synthesised as procollagens containing N- and C-propeptides at each end of the triple helical domain. Cleavage of the C-propeptides is required for fibrillogenesis. The C-propeptides are cleaved by procollagen C-proteinases, which are identical to the BMP-1/tolloid proteinases (Greenspan, 2005). In the case of collagen V, the proα1(V) chain is cleaved by furin to release the C-propeptide. The N-propeptides are cleaved by procollagen N-proteinases, which are identical to the ADAMTS 2, ADAMTS 3 and ADAMTS 14 proteinases (Colige et al., 2005). The proα1(V) chain of collagen V is the exception: it is cleaved by BMP-1 (for details, see Greenspan, 2005). Cleavage of the propeptides exposes telopeptide sequences that are short non-triple helical extensions of the polypeptide chains. The telopeptides contain binding sites for fibrillogenesis (Prockop and Fertala, 1998). The fibrillar collagens are stabilised by non-reducible covalent crosslinks that involve residues in the triple helix and in telopeptides (Eyre et al., 1984). The crosslinks are essential for the normal mechanical properties of collagen-containing tissues.
Fibril-associated collagens with interrupted triple helices (FACITs)
FACITs are relatively short collagens, have interruptions in the triple helical domain and can be found at the surfaces of collagen fibrils. Collagen IX is the archetypal FACIT; it is covalently crosslinked to collagen II (Wu et al., 1992), and is post-translationally modified to carry a glycosaminoglycan side chain.
Collagen IV is the prototypical network-forming collagen. It forms an interlaced network in basement membranes, where it has an important molecular filtration function. The network is generated by head-to-head interactions of two trimeric NC1 domains. The resultant hexamer is stabilised by covalent Met-Lys crosslinks (Than et al., 2002). N-to-N interactions between four collagen IV molecules establish the crosslinked `7S domain', which is an important interaction node in the extended network. Collagen VIII is a major component of Descemet's membrane and vascular subendothelial matrices, where it occurs as polygonal superstructures. The related collagen X occurs in the hypertrophic zone of growth plate cartilage and is thought to form a network similar to that of collagen VIII (Stephan et al., 2004).
These collagens are type II transmembrane proteins that have a short cytosolic N-terminal domain and long interrupted triple helical extracellular (ecto) domains. They include collagens XIII and XXV, which have cell adhesive properties and occur on numerous cell types, including malignant cells. The ectodomains can be proteolytically shed by furin-like proprotein convertases. Collagen XVII is cleaved by ADAM family proteinases. A growing number of collagen-like transmembrane proteins that have triple-helical ecto domains are being identified in vertebrates and invertebrates. These have not been assigned to a specific class but have important roles in neural function and neural tube dorsalisation, eye development, modulation of growth factor activity, and have cell adhesive functions. These un-adopted collagens include ectodysplasin, gliomedin and other members of the colmedin subfamily of transmembrane collagens. The ectodomain of gliomedin is shed by BMP-1/tolloid proteinases (Maertens et al., 2007).
Collagen XV is found bridging adjacent collagen fibrils near basement membranes and can form a variety of oligomeric assemblies (Myers et al., 2007). Collagen XVIII is found in some basement membranes. Cleavage of part of the NC1 domains of collagens XV and XVIII releases endostatins, which are inhibitors of endothelial cell migration and angiogenesis, reduce tumour growth in animals, and control neuronal guidance in C. elegans (Marneros and Olsen, 2005).
Collagen VII is the major component of the anchoring fibrils beneath the lamina densa of epithelia. The NC1 domain of collagen VII is cleaved by BMP-1/tolloid proteinases.
Collagen VI is the archetypal beaded-filament-forming collagen. It is found in most tissues where it forms structural links with cells. Collagen VI monomers crosslink into tetramers that assemble into long molecular chains known as microfibrils, which have a beaded repeat of 105 nm.
Collagens and disease
The importance of collagen is exemplified in a wide spectrum of diseases, which are caused by >1000 mutations; see OMIM (Online Mendelian Inheritance of Man) for a comprehensive listing (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM). These diseases include Alport syndrome (collagen IV), certain arterial aneurysms (collagen III), Bethlem myopathy and Ullrich muscular dystrophy (collagen VI) (Baker et al., 2005), certain chondrodysplasias (collagen IX and XI), some subtypes of the Ehlers-Danlos syndrome (collagen I and V), specific subtypes of epidermolysis bullosa (collagen VII), Kniest dysplasia (collagen II), Knobloch syndrome (collagen XVIII), osteogenesis imperfecta (collagen I) (Marini et al., 2007), some instances of osteoporosis and osteoarthrosis, and Stickler syndrome (collagen II, IX and XI) (Van Camp et al., 2006). Mice with genetically engineered collagen mutations have been produced. Collagen accumulates or is ectopically expressed in adhesions, fibrosis, cirrhosis, cardiovascular disease and scars.
Collagens undergo extensive post-translational modification in the endoplasmic reticulum prior to triple helix formation. A number of enzymes and molecular chaperones assist in their correct folding and trimerisation. These include several hydroxylases, two collagen glycosyltransferases and peptidyl cis-trans isomerase, in addition to protein disulphide isomerase (PDI) (for a review, see Myllyharju and Kivirikko, 2004). HSP47 is a collagen-specific molecular chaperone that is essential for the normal synthesis of collagen (Nagata, 2003). Fibril-forming collagen molecules fold in a C- to N-terminal direction. The correct trimerisation of the NC1 domains is crucial for collagen assembly and has precluded the use of antigenic tags or fluorescent proteins at the C-terminus of the chains. Folding of the trimeric NC1 domain involves the formation of intra- and inter-chain disulphide bonds (Lees and Bulleid, 1994).
Collagens are relatively large proteins and are not accommodated in conventional small-diameter transport vesicles (Bonfanti et al., 1998; Trucco et al., 2004). Procollagen I can be cleaved to collagen inside the cell and intracellular collagen fibrils can occur in plasma membrane protrusions called fibripositors (see review by Canty and Kadler, 2005) (see also Canty et al., 2004). In embryonic stages, collagen fibrils are closely associated with the plasma membrane of tendon and corneal fibroblasts (Birk and Trelstad, 1986). Collagen V is needed for the nucleation of collagen-I-containing fibrils in vivo (Wenstrup et al., 2004).
The triple helix is resistant to proteolytic cleavage by pepsin, trypsin and papain. Clostridium histolyticum produces collagenases that cleave triple helices at numerous sites. The ability of collagens to resist cleavage by pepsin and trypsin, and their sensitivity to cleavage by bacterial collagenase, are used as research tools to identify and characterise collagens. Degradation of collagen and gelatin (unfolded or denatured collagen) in vivo can be mediated by MMPs, cysteine proteinases (e.g. cathepsins B, K and L), and serine proteinases (e.g. plasmin and plasminogen activator) (for reviews, see Everts et al., 1996; Sabeh et al., 2004).