www.sciencemag.org/cgi/content/full/336/6080/466/DC1

Supplementary Materials for Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe Pontus Skoglund,* Helena Malmström, Maanasa Raghavan, Jan Storå, Per Hall, Eske Willerslev, M. Thomas P. Gilbert, Anders Götherström,* Mattias Jakobsson* *To whom correspondence should be addressed. E-mail: [email protected] (P.S.); [email protected] (A.G.); [email protected] (M.J.) Published 27 April 2012, Science 336, 466 (2012) DOI: 10.1126/science.1216304 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S22 Tables S1 to S15 References (33–85)

Contents 1 Archaeological context 1.1 Burial sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Dating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Osteoarchaeology of the analysed samples . . . . . . . . . . . . . . . . . . .

4 5 5 6

2 Sample selection and DNA extraction

6

3 Library preparation and sequencing

7

4 Data preprocessing and alignment 4.1 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Stampy performance for ancient DNA . . . . . . . . . . . . . . . . . . . . . 4.3 Ste7 resequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 8 8 9

5 mtDNA analysis 5.1 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Haplogroup assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 9

6 Autosomal data processing 6.1 HM3+FINHM+HGDP . 6.2 FINHM+POPRES . . . 6.3 1KGPomni . . . . . . . 6.4 1KGPomni+Sweden . . 6.5 Complete genomes . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

11 11 12 12 12 12

7 Relative sequencing/alignment error rates

13

8 PCA and Procrustes analysis 8.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Validation using Saqqaq data . . . . . . . . . . . . . . . . . . . . . . . . .

13 14 14

9 Model-based clustering

14

10 Authenticity 10.1 Patterns of molecular degradation . . . . . . . . . . . . . . . . . . . . . . . 10.2 Testing for autosomal contamination . . . . . . . . . . . . . . . . . . . . .

15 15 15

11 Allele sharing analysis

16

12 Estimation of population divergence time from complete genomes 12.1 Divergence from genomes of Northwest European ancestry . . . . . . . . . . 12.2 Divergence from genomes of East Asian and West African ancestry . . . . .

16 17 18

13 Modelling European population history 13.1 Testing population topology using genealogical concordance . . . . . . . . . 13.2 Gene flow between a population related to G¨ok4 and extant European populations 13.3 Estimation of Neolithic farmer ancestry in European populations . . . . . . .

18 18 19 20

2

14 The Neolithic hunter-gatherer gene pool is not fully represented in modern Europeans 21

List of Figures S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

List of Tables S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

3

1

Archaeological context

The process of Neolithization in Scandinavia starts around 6,000 before present (BP) when the oldest traces of farming, husbandry and material culture expressions of the Funnel Beaker Culture (TRB; after the German term Trichterbecher Kultur ) appear in the archaeological record (33, 34). The TRB culture of Scandinavia represents the northernmost distribution of a large Western European complex (e.g. 35). The oldest TRB finds in the Southern parts of Scandinavia are roughly contemporary with the end-phase of the Mesolithic hunter-gatherer Erteb¨olle Culture (33). In Sweden, the introduction of TRB marks the transition to the Neolithic Stone Age despite the fact that hunter-gatherer groups are present in large parts of the area (12). The TRB complex is visible in the archaeological record until approximately 5,000 BP (12, 15). In the coastal areas of Southern and Eastern Middle Sweden the Pitted Ware Culture (PWC), representing a local hunter-gatherer complex, appears between the time span 5,300-4,400/4,000 BP (12). Material culture expressions such as pottery style and general subsistence patterns bear the greatest similarities with groups East of the Baltic Sea (e.g. 36 ). The PWC culture represents the youngest (last) hunter-gatherer culture in the area and is contemporary with TRB groups in the early phase, but later the PWC appears side-by-side with the Corded Ware Culture (or Battle Axe Culture; 12 ). The relationship between the farming and the hunter-gatherer complexes has been one of the most debated questions in Scandinavian archaeology (e.g. 12 ) and the question is of general relevance for archaeology and more specifically for the Neolithisation of the Northern parts of Europe (e.g. 35 ). The key question has been whether the TRB culture appear as a result of migrations to the area or whether the change represent adoption of new traditions and material culture expressions by local groups (12). Similar questions have been debated for much of the European continent (3, 5) Osteological studies have revealed a certain degree of morphological differences in the skeleton between individuals from TRB and PWC contexts. It has been shown that PWC individuals exhibit skeletal traits characteristic of cold-adaptation, such as certain facial features and limb proportions (crural index) which are absent in TRB individuals (37,38). Traits in the dentition exhibit both differences (39) and simlarities between the two groups (40). The mean stature of the two groups is roughly similar for males (164.0 cm for TRB and 164.6 cm for PWC) but there is a larger difference for females (154.6 cm for TRB and 157.7 for PWC) (38). The osteological observations are affected by small sample sizes and especially the TRB samples are affected by the collective burial custom. The bones are normally recovered commingled and few skeletal elements may be associated to articulated skeletons. We also note the general difficulties involved in interpreting the reasons for morphological differences in skeletal material. Thus, it has not been possible to make conclusive interpretations of the relationship between the TRB and PWC individuals based on osteological data. However, chemical and biomolecular analyses of human bones have in recent years shed new light on the relationship between material culture units and human groups in general and more specifically the relationship between the TRB and PWC complexes. Isotopic studies (Strontium, Sr) indicate that some level of mobility occurred of those buried in the megaliths in Falbygden, V¨asterg¨otland, and that as much as 25% of the population may have been non-local (by birth) (41). Interestingly, the data indicate that the most likely area of origin of those interpreted as non-locals

4

was to the southwest of the Falbygden area, i.e., areas with abundant TRB remains and artefacts but with few traces of PWC. The analyses indicate a low level of mobility between the geographical areas of PWC and TRB. The relationship between the TRB and PWC has further been investigated by stable analysis of dietary patterns (Carbon, C and Nitrogen, N). Such analyses have shown that on the island ¨ of Oland in the Baltic Sea, the two groups apparently lived side by side but kept distinct and different dietary patterns (42). TRB individuals buried in megalithic structures exhibit a dietary pattern with mainly terrestrial proteins while PWC individuals exhibit a diet based on marine proteins. The dietary pattern at Ajvide and Ire on the island of Gotland is similar ¨ to that of the Pitted Ware Culture individuals on Oland indicating a high intake of marine proteins (43–45). Stable isotope analyses of human remains from several megalithic structures in V¨asterg¨otland, among them samples from the same parish as the TRB individual (G¨ok4) analysed in the present study, show a dietary pattern including proteins mainly of terrestrial origin (44, 46, 47). The TRB and PWC are two distinct archaeological cultural units and the differences in material culture expressions between them may be regarded as fundamental and includes artifact types, burial constructions and burial customs, settlement pattern and subsistence strategies as well as dietary patterns.

1.1

Burial sites

In the present study, aDNA of four individuals from the Stone Age were successfully analysed (Table S1). Three of the individuals originate from two Pitted Ware Culture cemetery and settlement sites on the island of Gotland (Ajv52, Ajv70 and Ire8). One of these sites, Ajvide, is located in the southwestern part of the of Gotland while Ire is located in the northern part of the island. The material culture expressions of the two Gotland sites are characteristic of the PWC and show striking resemblances. One individual (G¨ok4) originates from the megalithic burial structure (G¨okhem 94) characteristic of the Funnel Beaker Culture in G¨okhem parish, V¨asterg¨otland, Sweden. The fifth sample (Ste7) that was excluded from the analyses (see sections 4 and 7) originates from the same megalithic structure. G¨okhem 94 contained more than 9,000 fragments of commingled human bones, together with the remains of four articulated skeletons. The megalith was in (primary) use between approximately 3,400-2,900 BC.

1.2

Dating

The sample from G¨okhem (G¨ok4) has been 14 C-dated to 4,921 ± 50 calBP (approx. 31002900 BC; 15 . The three Neolithic hunter-gatherer samples from Gotland have not been dated directly but several radiocarbon dates of other burials from the two cemeteries indicate a rougly similar age as the individual from G¨okhem. Radiocarbon dates of several burials from the sites fall roughly between 4,360-4,005 BP at Ajvide (48) and between 4,280-3,850 BP for Ire (49). This corresponds roughly to 3,300-2,350 BC and 3,100-2,150 BC in calibrated years. Note that the dates for Ire were done with conventional dating methods while the Ajvide dates were obtained using accelerator mass spectrometry (AMS) and are therefore more reliable.

5

1.3

Osteoarchaeology of the analysed samples

• Ajv52 (Ajvide 52; ID:146): Child, 7 years old. Grave 52 at Ajvide contained skeletons of two children, one aged approximately 7 years and another 1.5-2 years (50). The older individual was sampled for the present study. Grave goods associated with the older child were bone points, beads made of seal teeth and bird bones, a worked pig tusk, and stone implements. • Ajv70 (Ajvide 70; ID:164): Male aged 20-25 years (50 ; Fig. S1). Burial goods recovered were the base of a ceramic pot (PWC) and a worked bone artefact. • Ire8 : Female, aged 40-50 years. Close to the skeleton one shard of pottery and a few fish bones were recovered (49). • G¨ok4 : Female aged approximately 20 years recovered in the megalithic structure G¨okhem 94 (39, 41). The skeleton was not complete. Analyses of Sr in teeth (87Sr/86Sr = 0.7171) indicate that the individual was born within an area between 50-100 km south or southwest from the megalithic structure (17). • Ste7 : Analysed tooth found as strayfind in the megalithic structure G¨okhem 94. No osteological data available.

2

Sample selection and DNA extraction

DNA extracts from seven Neolithic samples, five of which human (Ajv52, Ajv70, Ire8, Ste7 and G¨ok4), were chosen from a previous study (8) based on several authenticity criteria that indicated suitable preservation and low levels of contamination: • Quantitative real-time PCR (qPCR) had shown that the extracts contained more than 1,000 template copies of an 80 bp mtDNA fragment (between 47,334 and 1,497 copies for Ajv70 and G¨ok4, Table S4A in (8)) • qPCR had revealed an inverse correlation between fragment length and DNA yield (8). The degradation ratio was between 9.07 and 3.23 for Ajv70 and Ste7, see Table S4A in (8). • Roche GS FLX deep sequencing of short overlapping d-loop amplicons produced vast amount of synthetic clones that gave consistent results between clones deriving from independent DNA extractions (8). • Consensus mtDNA sequences comprising 341 bp of the d-loop yielded phylogenetically consistent results (8). • The samples Ajv52, Ajv70 and Ire8 had also yielded nuclear DNA in an earlier study (14). Two animal samples, a Harp seal (Pagophilus groenlandicus) from Ajvide (GSAj15) and a domestic dog from Link¨oping (CFBe3a), that had been handled in parallel with the human samples were used as negative controls. The GSAj15 sample was conservatively chosen based on it being the non-human sample with the most evidence of human contamination in a 6

previous study (197 template copies of the same 80 bp human mtDNA fragment referred to above; 8). The CFBe3a sample had 11 template copies of the human mtDNA fragment (8). Thus, while both animal samples had previously shown evidence of a low degree of human mtDNA contamination, the amount of human mtDNA in the animal samples was much lower compared to all human samples used in this study (8). Bones and teeth from the seven samples had been decontaminated (13) and DNA extracted (8) according to previous protocols. The outer surfaces of bones were removed by sandpaper polishing and teeth were wiped with 0.1M HCl. All samples were soaked in 0.1M HCl, rinsed with ddH2 O (DNA free ELGA grade) and dried. Between 50-200mg bone powder was obtained per sample with a Dremel automated drill and soaked in 0.5% sodium hypochlorite for 15 min and rinsed three times with LiChrosolv water (Merck, Darmstadt, Germany). The powder was subsequently incubated for 24 h at 55◦ C in 1.6 ml buffer containing 0.5M EDTA (pH 8.0), 1M urea and 100g/ml proteinase K. The sample was then briefly spun at 2,000rpm for 5 min before transferring the supernatant to an Amicon Ultra-15 centrifugal filter (Millipore, MA, USA) which was centrifuged at 4,000 g for 15 minutes, or until 50-100µl remained in the column. The extract was then purified using QIAquickT M silica-based spin columns (Qiagen, Hilden, Germany) based on the method of (51). First, the extract was added to a spin column together with approximately 5 times its volume of PB buffer. Following centrifugation at 13,000rpm for 1 minute, 750µl PE buffer was added and the column was centrifuged as before until there was no visible liquid left. DNA was then eluded by adding 2 X 45µl EB buffer to the column membrane and centrifuging for 1 minute at 13,000 rpm.

3

Library preparation and sequencing

Paired-end Illumina libraries were built on all samples following the manufacturer’s protocol (Illumina Paired-End Sample Prep Kit), with the given modifications. The nebulization step was bypassed due to the fragmented nature of ancient DNA. Adaptor ligation was performed using 1µl of Index PE Adaptor Oligo Mix (Illumina Multiplexing Sample Preparation Oligonucleotide Kit). Libraries were amplified using both Phusion polymerase (Finnzymes, Espoo, Finland) and Platinum Taq DNA Polymerase High Fidelity (HiFi) (Invitrogen, Carlsbad, CA), the former of which is known to not faithfully amplify molecules that contain uracil residues. The set-up for the first round of amplification using Phusion polymerase was as follows: 5µl DNA library, 1x Phusion HF buffer, 200µM dNTP each (Invitrogen, Carlsbad, CA), 500nM Multiplexing PCR primer 1.0, 10nM Multiplexing PCR primer 2.0, 500nM Index PCR primer, 0.02U/µl Phusion DNA Polymerase and water up to 50µl (primers obtained from the Illumina Multiplexing Sample Preparation Oligonucleotide Kit). Cycling conditions were: 98◦ C for 30 seconds; 18 cycles of 98◦ C for 10 seconds, 65◦ C for 30 seconds, 72◦ C for 30 seconds; and, a final extension at 72◦ C for 5 minutes. The Hifi PCR was set up in two successive amplification rounds, with the first amplification protocol as follows: 5µl DNA library, 1x High Fidelity PCR buffer, 2mM MgSO4 , 200µM dNTP each (Invitrogen, Carlsbad, CA), 500nM Multiplexing PCR primer 1.0, 10nM Multiplexing PCR primer 2.0, 500nM Index PCR primer, 1 unit of Platinum Taq High Fidelity and up to 50µl water. Cycling conditions for the HiFi amplification were: 94◦ C for 4 minutes; 18 7

cycles of 94◦ C for 30 seconds, 60◦ C for 30 seconds, 68◦ C for 30 seconds; and, a final extension at 72◦ C for 7 minutes. PCR products from this first amplification round were purified through QIAquick spin columns (Qiagen, Hilden, Germany), from which 5µl of library was used to set up a second HiFi amplification round using the same PCR set-up as above and similar cycling conditions except that the number of cycles was increased to 22. All Phusion and HiFi (second round) PCR products were purified using QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) with no modifications to the provided protocol and eluted in a final volume of 30µl. Libraries were sequenced on at least one lane each on the Illumina GAIIx platform.

4

Data preprocessing and alignment

Raw data was analyzed using 1.7.0 of CASAVA and 1.8.0 of the OLB (offline basecaller). To avoid possible contamination that could have entered the library outside clean room facilities, we stringently discarded all reads that did not have the exact correct index sequence (Table S2). The retained fraction was generally 96%-99% in all libraries except the one from the seal, for which the fraction was only 48%. In addition, we trimmed remaining adaptor sequences, again requiring an exact match of a portion of the adaptor sequence and that it was located in the 5’ end of the sequence read. Sequencing errors can lead to untrimmed adaptor sequences when only exact matches are identified and trimmed, but these should have minimal effect on our analysis due to further filtering criteria applied below.

4.1

Mapping

We mapped all sequence reads to the human reference genome (UCSC hg18) with the mitochondrion replaced by the Cambridge reference sequence (rCRS). We used Stampy-1.0.10 (18), an accurate mapper for Illumina sequence reads used e.g. for indel discovery in the 1000 genomes project (25). We ran Stampy in ’sensitive’ mode under default parameters. Duplicate reads were filtered based on mapping positions (’rmdup’) using the samtools package (52). The fraction of reads mapping to the human genome (Table 1 in the main text) is similar to the fraction obtained from the best preserved Neandertal fossils (53,54), but lower than a permafrost preserved modern human of similar age (11).

4.2

Stampy performance for ancient DNA

To assess Stampy’s applicability to ancient genomic data that is affected by e.g. postmortem nucleotide misincorporations (19), we compared the fraction of mapped reads obtained by Stampy and 50,000 randomly subsampled reads from the same library that were aligned using megablast (55) to an extensive database (hg18, panTro2, NCBI ”other genomic” and ”ref seq”). We used a word size of 12 and required an e-value of less than 0.001. Aligning against a large database with genomic information for several vertebrates closely related to humans together with the high sensitivity of megablast allows distinguishing between sequences that have their absolute best match to humans from sequences that align to humans as well as other organisms. We find that the fraction of human reads obtained for the 5 human 8

samples sequenced using both Hifi and Phusion polymerase is highly similar when mapping with Stampy and megablast (Fig. S2).

4.3

Ste7 resequencing

One of the five human samples, Ste7, had a significantly lower fraction of reads mapping to the human genome than the other four samples. Attempting to remedy this, a new library was built for this sample using the GS Rapid Library Prep Kit (Roche, Branford, CO) and following the provided protocol with some modifications. 1µl of a 5µM dilution of the Index PE Adaptor Oligo Mix was added to the end-repaired and A-tailed DNA fragments with 1µl of RL ligase, and the adaptor ligation reaction was carried out at 25◦ C for 15 minutes. Adaptor-ligated libraries were purified using MinElute spin columns (Qiagen, Hilden, Germany) and eluted in 30µl of EB buffer. The HiFi PCR protocol and cycling conditions outlined in section 3 were followed for the new Ste7 library with the exception that two reactions with 3µl DNA library were set up for each of the two PCR rounds, and, shorter primers than the ones provided by Illumina were used in both the amplification rounds in order to minimize the formation of primer dimers (see [56] for primer sequences, Index 2). The two PCR reactions after the second amplification round were pooled together and gel purified using the E.Z.N.A gel purification kit (Omega Bio-Tek) and eluted with 30µl elution buffer. The gel-purified library was sequenced on six additional Illumina GAIIx lanes. However, we found that saturation in the number of unique sequences had been reached, as the total number of unique human reads did not increase proportional to the amount of sequencing (Table S3). This was also reflected in the number of new SNPs obtained as data from each sequencing lane was added. On the basis of these observations and the increased error rate in Ste7 data (see section 7), this individual was excluded from all population genetic analyses.

5

mtDNA analysis

The 5 Neolithic samples had previously been assigned to mtDNA haplogroups based on 341 bp of the mitochondrial d-loop (8). To authenticate our data and, if possible, obtain better resolution of the haplogroups, we assembled mitochondrial genome sequences and compared the results with those obtained previously.

5.1

Assembly

We extracted all reads that aligned to the human mitochondrion (rCRS), and called a consensus sequence using the samtools package with default parameters assuming a haploid model. We find that the mtDNA genome coverage shows little correlation with the total amount of human DNA obtained from each sample (Table S4), which is in line with other observations of highly variable mtDNA-nuDNA ratios in ancient DNA extracts (57).

5.2

Haplogroup assignment

The haplogroup affiliation of each sample was assigned using HaploGrep (58) and PhyloTree version 12 (http://www.phylotree.org; 59 ). The mutations are described relative to the rCRS (60). The haplogroup affiliation of all samples corresponded to earlier results. The 9

increased amount of sequence data added further support, and in three cases, the place of the sample in mtDNA gene tree was refined down to subhaplogroup level. Two of the Neolithic hunter-gatherer samples, Ire8 and Ajv70, were previously classified as U4/H1b based on substitutions at np 16093 and 16356 (8). The mitochondrial genome of Ire8 displays 22 of the 26 mutations expected for subhaplogroup U4d (Table S5). The missing mutations are T195C (that define U4’9 together with G499A and T5999C), T4646C (that define U4 together with A6047G, C11332T, C14620T, T15693C, T16356C) and 2405.1C (that define U4d together with T629C). The consensus sequence lacks coverage for A8860G (which together with A263G and A15326G define H2a2a). The mutational pattern for the Ajv70 mitochondrial genome also corresponds to that of haplogroup U4, either within subhaplogroup U4d or U4a2a. In both cases, Ajv70 displays 23 of the 26 expected mutations but lacks, as do Ire8, the T195C and 2405.1C mutations and sequence data for A8860G. The missing mutations leading to U4a2 are T195C and C8818T (defining U4a). However, the sample does display the T310C substitution that defines U4a2. The third Neolithic hunter-gatherer sample, Ajv52, was previously assigned to haplogroup V based on a transition at nucleotide position (np) 16298 (8). The new data further provide 10 of 12 mutations leading to V (Table S5). The two expected transitions that are missing are A8860G, for which there is no sequence data, and A4769G (defining H2a). The Neolithic farmer sample, G¨ok4, was previously classified as belonging to haplogroup H as no d-loop mutations were observed (8) and the shotgun sequence assembly provided four out of six expected mutations that further supported the classification. The transition at np 8860 is missing as there is no sequence data for that position and the transition at np 4769 display the ancestral allele state. The second Neolithic farmer sample, Ste7, had previously been assigned to haplogroup T as it displayed transitions at np 16126-16294-16304 (8). The new assembly could further place the sample within subhaplogroup T2b3 as 28 of 32 expected mutations were present. The ancestral allele state was present for A8860G, A4769G and C16296T (that defines T2 together with A11812G and A14233G). No sequence data were available for mutation G5147A (defining T2b together with G930A and T16304C). Further, Ste7 seems related to T2b3a, as the sample displays one (C151T) of four (151T, 152C, 5656G and C16292T) polymorphisms that define the subhaplogroup. The number of non-diagnostic mutations (mutations not observed in PhyloTree) were considerably higher in assemblies with lower coverage (G¨ok4 had 92 such mutations compared to 10, 19, 21 and 26 for Ajv70, Ajv52, Ste7 and Ire8 respectively) (Table S6). Note that the relative sequence error estimates presented for the autosomal data (see section 7) are for positions that are already known to be polymorphic, and thus the total fraction of errors for low-coverage data when taking all possible positions into account is much greater. The majority of the mtDNA polymorphisms that have not been documented previously are likely erroneous, and they display the expected relationship to coverage, with the mtDNA genome assembly with the lowest coverage (G¨ok4) also having the highest number of this type of polymorphisms (Table S6). In addition, the mtDNA genomes display a third category of polymorphisms: mutations that are observed in association with other haplogroups (Table S5). The samples displayed between 10

zero and nine such mutations. These were all transitions, found between one and 36 times at widely dispersed positions in the phylogenetic tree. However, some of these could represent true polymorphisms in the Neolithic individuals. For example, both Ire8 and Ajv70 displayed transitions at np 16093 in the previous study (8). Other polymorphisms in this category are likely due to errors, for example the transitions at np 16145 and 16167 in the G¨ok4 mtDNA, for which the previous study showed no support.

6

Autosomal data processing

For our population genetic analyses, we obtained haploid genotypes for all autosomal positions that overlapped with SNPs in the particular reference dataset (see below and Fig. S3) with the following criteria: • We retained reads with mapping quality greater than 30. • We retained bases with quality greater than 15. • We excluded positions where a T was found in the sequencing read and a C in the reference human SNP data set. • We excluded positions where a A was found in the sequencing read and a G in the reference human SNP data set. • We excluded positions with more than two alleles (see Table S7 and Table S8). • We excluded positions with gaps or an N in the sequencing read. • We randomly sampled one read from positions covered by multiple reads. We base all autosomal population genetic analyses on known human single nucleotide polymorphisms (SNPs) from five different reference sets. Compared to sequence-based analyses, the advantage is that most sequencing errors (which are prevalent in second-generation sequencing data, in particular from ancient DNA) will be weeded out and therefore have a much smaller impact on the analyses. Haplotypes and missing genotypes were estimated for each dataset of extant humans using fastPHASE (61) version 1.4.5 (except for the data that were already provided with phase information, see below). The number of haplotype clusters was set to 25, and we used 25 runs of the EM algorithm. This analysis was used to generate a ”best guess” estimate of the true underlying patterns of haplotype structure. When multiple data sets were to be combined, we merged them separately based on position, excluded A/T and G/C SNPs, and flipped genotypes according to data sets that were provided with hg18 strand orientation.

6.1

HM3+FINHM+HGDP

We obtained phased Finnish HapMap (FINHM) data from 81 individuals in two groups (FIN and LSFIN) (ftp.fimm.fi/pub/FIN HAPMAP3/; 22 ), phased hapmap3 (HM3) genotypes from 7 populations (CEU; TSI; GIH; JPT+CHB; MKK; LWK; YRI; ftp.ncbi.nlm.nih.gov/ hapmap/phasing/2009-02 phaseIII/HapMap3 r2/; downloaded 30 nov 2010, 23 ), and 11

phased genotypes for 938 individuals from the Human Genome Diversity Project (HGDP; http://hgdp.uchicago.edu/Phased data/; 21, 62 ). To resolve strand issues in the data sets, we flipped SNPs to hg18 strand orientation for HGDP, FINHM, and HM3 data separately, and intersected the 1,163,280 SNPs common to HM3 and FINHM with the HDGP Illumina SNPs. We excluded SNPs with MAF≤0.5%, leaving a total of 519,951 SNPs.

6.2

FINHM+POPRES

We merged the FINHM data set with a data set consisting of 1385 individuals from the POPRES collection that had been genotyped on Affymetrix 6.0 arrays (24, 63). We used 2 outlier iterations in EIGENSOFT with pairwise r2 limit of 0.2 to exclude 7 individuals in the FINHM data that were identified as outliers on the first or second eigenvector (LSFIN2, LSFIN22, FIN12, LSFIN10, LSFIN30, LSFIN40, LSFIN21, LSFIN28). We excluded SNPs with MAF≤0.5% to a final total of 279,344 SNPs.

6.3

1KGPomni

We used 2,379,175 autosomal SNPs genotyped on Illumina 2.5 Omni arrays (25) from 241 individuals from 5 European populations (CEU, TSI, FIN, GBR, IBS). We performed 5 outlier identification iterations in EIGENSOFT 4.0 with pairwise r2 limit of 0.2, and exluded 3 CEU outliers (NA12329, NA06989, NA06984) based on visual inspection of the PCA results. We excluded SNPs with MAF≤0.5%.

6.4

1KGPomni+Sweden

We phased genotype data from 353,565 autosomal SNP typed in 1,525 Swedish individuals (28) as described above. This data set was then merged with 1KGPomni (see above) in order to flip to hg18 strand orientation, excluding all G/C and A/T SNPs, resulting in 250,725 autosomal SNPs that were present in both data sets. We excluded SNPs with MAF≤0.5%.

6.5

Complete genomes

SNP data genotyped using arrays suffers from ascertainment bias, and reports a limited number of polymorphisms for each individual. We attempted to circumvent these issues by identifying SNPs between 9 human genomes sequenced to high-coverage. These consisted of the 7.5X genome of Craig Venter (64) obtained with capillary seqeuncing technology together with 30X genome assemblies obtained with Illumina sequencing technology from 2 individuals from Utah of European descent (25), one individual of Chinese descent (65), one individual of Korean descent (66) and two Yoruba individuals from Ibadan, Nigeria (25), and finally the genome of an individual of Northern European ancestry sequenced with Helicos technology (67). We added data on the chimpanzee allele (panTro2; 68, 69 , and excluded hypermutable CpG sites to a total data set comprising 6,001,729 SNPs. To avoid biases from using different sequencing platforms and genotyping methodologies, we restricted all population genetic analyses to the seven genomes sequenced by Illumina technology (the Venter (64) and Patient 0 (67) genomes were excluded), and used genotypes that had been

12

called by the 1000 genomes project and filtered for mapping uniqueness and genotyping quality (25 ; ftp://ftp.sanger.ac.uk/pub/rd/humanSequences/; retrieved April 2011). We extracted genotypes from the Neolithic data separately for each sample as described above. We note that ascertainment bias also affects this data since we only use sites that have been found to be polymorphic among the 9 diploid genomes. We therefore ignore potential sites where the Neolithic data has an ancestral allele (shared with the chimpanzee) that is derived in all the 18 chromosomes from the modern genomes. This would lead to an underestimation of population divergence time between Neolithic individuals and modern genomes (see section 12). However, these sites are rare, and are more susceptible to sequencing errors compared to sites known to be polymorphic in the human population.

7

Relative sequencing/alignment error rates

To investigate relative sequencing error rates in sequences from different Neolithic samples, we counted the number of sites in which the Neolithic individual had a third allele that had not been seen in the 9 high-coverage human genomes and the chimpanzee. This fraction of sites comprises positions in which there is a sequencing or alignment error in the Neolithic data, and possibly also mismatches caused by sequencing and alignment errors in the modern data. Regardless, these sites should be informative on the relative rates of sequencing and alignment errors between the different Neolithic data sets. We find that Ste7 has an approximately 3 to 7 times inflated error rate compared to the other 4 samples (Table S7). This could possibly be due to the lower fraction of reads mapping to the human genome, which could cause a proportionally higher number of reads within this fraction to be the result of mapping error. We also find similar results when computing the fraction of triallelic sites using the 1KGPomni genotype data (Table S8), with Ste7 having approximately 2 to 3 times as high error rate as the other individuals. Error rates for the remaining 4 Neolithic individuals are rather homogeneous, but slightly inflated for the Ire8 and G¨ok4 individuals.

8

PCA and Procrustes analysis

A central problem for ancient genomic projects is the large fraction (usually >95%) of retained shotgun sequences that are comprised of non-endogenous sequences (e.g. from microbes in bone material [57 ]). This has the consequence that it is very difficult to obtain high-coverage data even from relatively well-preserved material (53, 54). The data from the four different Neolithic individuals presented here cover only about 1%-3.5% of the human genome each, resulting in few overlapping positions that could be used for direct population genetic comparisons between individuals. However, the vast amount of reference data for modern human genetic variation offers an opportunity of indirect comparison, where each individual is analyzed together with a modern reference data set separately, whereafter the individual analyses are considered jointly. Here we present such an approach based on Procrustes analysis (70) that is tailored for joint analysis of non-overlapping data from different individuals using principal component analysis or other methods based on the same conceptual framework (71). 13

8.1

Implementation

Under the assumption that the genealogical history of each genetic locus–although nonoverlapping–represents a random outcome of the same ancestral process, the relationship between non-overlapping data from different individuals from the same population (i.e. individuals with similar ancestry) and individuals from other populations is largely independent of which loci are used. We transformed PC1 and PC2 from each individual analysis to match the PC1 and PC2 configuration obtained using only the reference individuals (using all SNPs in the data set, and where each ancient (Neolithic) individual was given coordinates (0, 0)) (Fig. S4). As long as the ancient individual does not heavily influence the loadings of the reference individuals, the PC1 and PC2 loadings for a given reference individual in an analyses of a subset of SNPs from the complete set should just reflect the sampling variance in the number of SNPs. Therefore, we take the mean of PC1 and PC2 obtained in each separate analysis for each reference individual, but retain the transformed PC1 and PC2 configuration for ancient individuals. All principal component analyses were performed using EIGENSOFT 4.0 (20), with one SNP from each pair with r2 > 0.2 excluded. Results are shown in Figs. S5-S9.

8.2

Validation using Saqqaq data

As a proof of concept, we analyzed sequence reads from two Illumina libraries prepared from a paleo-eskimo hair sample (”Saqqaq”; 11 ) that had been amplified by both Hifi polymerase and Phusion polymerase by (72) and mapped to the human genome using BWA (73) by Amhed Missael Vargas Velazquez and Ludovic Orlando. We used the same genotype filters as described above and obtained 23,587 SNPs from the Hifi data and 38,107 SNPs from the Phusion data that overlapped with the HM3+FINHM+HGDP data set. We used 48 non-African populations in the data set to generate a PC1-PC2 projection as in Figure 3B in (11), and used Procrustes analysis to obtain a joint analysis of the non-overlapping data from the two different libraries as described above. The results were highly consistent between the haploid low-coverage data analyzed using our pipeline and the diploid high-coverage data analyzed by (11), as well as highly consistent between the two libraries amplified by Hifi and Phusion (Fig. S10).

9

Model-based clustering

Each Neolithic individual was clustered together with extant European individuals from HapMap, Finnish HapMap, and HGDP (HM3+FINHM+HGDP) plus the Levantine Druze population using an unsupervised clustering algorithm implemented in ADMIXTURE (26). For each Neolithic individual was the clustering performed based on SNPs that overlapped between the Neolithic individual and the set of extant individual (see Table 1 in main text). All runs of ADMIXTURE were based on the ’admixture’ model, in which each individual is assumed to have ancestry in multiple clusters. Default settings were used (except for generating the random seed). For a given value of K, the number of clusters considered, 10 replicate analyses were performed for each dataset comprising of one Neolithic individual and 384 extant individuals. The 10 replicate runs of ADMIXTURE (for each set of individuals and each K) were analyzed with CLUMPP version 1.1.2 (27) to identify common modes among replicates. The CLUMPP analysis used the fullSearch algorithm for K = 2 and 3, and the Greedy algorithm with 10,000 14

random permutations for K = 4. The 10 runs of ADMIXTURE produced very similar solutions, the average similarity coefficient H 0 was greater than 0.98 for all choices of K (2, 3 and 4) and for all four Neolithic individuals. The clustering results were visualized with Distruct (74). Fig. S11 shows the clustering solutions for the four Neolithic individuals assuming 2, 3 and 4 clusters (each plot shows the average across 10 replicate ADMIXTURE runs). Figure 1D in the main text was generated by extracting the (averaged) ancestry matrix of the extant individuals from each analysis containing one Neolithic individual and averaging them using CLUMPP. These four ancestry matrices were similar (H 0 = 0.795) and the four Neolithic individuals were merged to the plot so that the original clustering solutions were respected (compare with Fig. S11).

10

Authenticity

Several observations, such as high fraction of sequences mapping to the human genome compared to animal controls and matching mitochondrial genome assemblies to previously obtained mtDNA data, point to the presence of endogenous DNA in the human sequence data. To investigate authenticity more closely we characterized two physical features that have previously been shown to differ between authentic ancient DNA and contaminating modern human molecules (19,29): (i) increased rate of thymine residues at positions where a cytosine is found in the reference sequence due to nucleotide misincorporation, and (ii) increased rate of purine residues to the 5’ end of the mapped sequence reads. However, increased rate of purines in the 5’ end can also be caused by bleach treatment (75), which was used to decontaminate the samples. We therefore base our conclusions and method for assessing authenticity in the autosomal data (see below) only on the nucleotide misincorporation pattern. Since Phusion polymerase does not faithfully replicate Uracil (11, 72, 76), these analyses only utilized data from libraries amplified by Hifi.

10.1

Patterns of molecular degradation

In each alignment to the human genome, we extracted the positions covered by the read plus 5 bp in the 5’ and 3’ from the hg18 reference sequence, and locally realigned the read and the reference sequence using MUSCLE3.8 (77). We then computed the average base frequency at different positions of both the sequence read and the sequence at and around the aligned region in the reference genome. We required reads to have a mapping quality of at least 30 and averaged the frequency over all analyzed positions. We find clear evidence of patterns (i) and (ii), which were highly homogenous among samples (Fig. S12).

10.2

Testing for autosomal contamination

The presence of endogenous DNA does not preclude that a small fraction of contamination could influence the population genetic analyses and comparisons with modern populations. To test whether contamination could be present in sufficient amounts to affect our conclusions, we devised a new approach that relies on the expectation that C→T nucleotide misincorporations are many times more frequent in endogenous molecules than contaminating molecules. By dividing the sequences into those that have and those that do not have evidence of nucleotide 15

misincorporation we can create fractions that are highly enriched for endogenous and putative contaminating molecules, respectively. Assuming significant (but not overwhelming) presence of contamination, the latter category without evidence for misincorporation is expected to still contain a portion of endogenous molecules that do not have misincorporations, but the former category with misincorporations is expected to be composed mostly of endogenous sequences. All population genetic analyses that could be subject to contamination can then be assessed for different behaviour between the two fractions. To implement this idea, we created additional sets of genotypes where we only considered sequence reads that either had, or did not have a T where the reference base was a C within the first 10 bp (in the 5’ end) (Table S9). We used the same method for local realignment as above. We performed PCA and Procrustes analysis on each separate fraction (with- and without nucleotide misincorporations) separately, together with the TSI (Tuscans from Italy) and FIN (Finnish from Finland) in the 1KGPomni reference data set, since differentiation between these two populations summarizes the main component of differentiation between Neolithic hunter-gatherers and the Neolithic farmer in our analyses (Fig. 1 in the main text). We find no evidence suggesting that contamination could influence our analyses and conclusions, with the separation and different population genetic affinities between the Neolithic farmer and the Neolithic hunter-gatherers remaining evident in all analyses (Figs. 2, S13 and S14).

11

Allele sharing analysis

We computed the mean frequency of the allele observed in Neolithic farmer data (G¨ok4), and Neolithic hunter-gatherer data (Ajv52+Ajv70+Ire8) separately in various extant populations, summing over the total number of SNPs. To create a data set representing all three Neolithic hunter-gatherers we pooled the sequence reads from all individuals and genotyped all SNPs using the procedure described in section 6, resulting in 12,609 SNPs that overlapped with the FINHM+POPRES data. We restricted the analysis to the 25 populations in the FINHM+POPRES data set with sample size of at least 8 chromosomes, and computed the frequency of the allele observed in either Neolithic hunter-gatherers or the Neolithic farmer in extant populations by averaging over 8 randomly sampled chromosomes (without replacement). This procedure was replicated 100 times (for each SNP) and the mean across the 100 repititions is reported in Table S10. Spatial interpolations of Neolithic allele frequencies were performed using the Krig function in the R package ”fields” (78). Coordinates for POPRES populations were obtained with Geonames (http://www.geonames.org/). For some populations (Sweden, Russia, United Kingdom) the coordinates were placed in the capital (63). FINHM coordinates were placed in Helsinki (FIN) and Oulo (LSFIN).

12

Estimation of population divergence time from complete genomes

To provide an alternative view of the genetic relationship between the Neolithic individuals and modern humans, we computed the population divergence time between each Neolithic individual and one of seven complete genomes from extant humans. We used the same framework for estimation of population divergence time using reference genomes as in (32), with the 16

main difference being we estimate population divergence time to each modern diploid genome separately, using the chimpanzee as an outgroup. To implement the method, we compare the two gene copies in a modern genome to the Neolithic gene copy and the chimpanzee. We infer a genealogical topology only in those cases where two copies are identical, but differ from the other two (one of which is the chimpanzee–representing the ancestral state), which are also identical. We then count sites where the two copies from the modern genome share a derived allele not found in the Neolithic sample as ’concordant’ (assuming a model where the Neolithic individual splits from the modern population). The remaining two configurations result in a ’discordant’ topology. When two copies from the same population (in this case a single individual) are sampled in this way, the internode time T in the framework outlined in (32) becomes equal to the total divergence time between the modern population and the Neolithic population. Since the expected fraction of concordant topologies increases proportional to the time since population divergence, we can estimate T (measured in units of 2Ne generations where Ne is the effective population size) and obtain confidence intervals in a maximum likelihood framework using a coalescent model (e.g. 79 ). We have the log-likelihood function 2e−T Log(L(T )) = Nconc × 1 − 3

!

− Ndisc ×

2e−T 3

where Nconc is the number of concordant genealogies, and Ndisc is the number of discordant genealogies (the sum of the of two types) (79). For estimating a joint population divergence times for all the three Neolithic hunter-gatherers, we pooled the sequence reads from the three individuals (Ajv52+Ajv70+Ire8) and obtained genotypes as outlined above.

12.1

Divergence from genomes of Northwest European ancestry

We find that the estimated population divergence time Tˆ between all Neolithic humans and modern individuals of Northwest European ancestry (CEU) is on the order of 0.01, with point estimates from 0.0069 to 0.0241. Comparing estimates for the Neolithic farmer to the pooled Neolithic hunter-gatherer data obtained using NA12891 we obtain Tˆ = 0.0150 (95% ML CI: 0.003-0.027) and Tˆ = 0.0155 (95% ML CI: 0.008-0.023), respectively (Table S11). The estimates obtained using NA12892 are more disparate but with overlapping confidence intervals, specifically we obtain Tˆ = 0.0241 (95% ML CI: 0.012-0.037) for the Neolithic farmer and Tˆ = 0.0103 (95% ML CI: 0.003-0.018) for the Neolithic hunter-gatherers. Assuming a recent European effective population size of 2Ne =10,000 (80) and a 25 year generation time, the estimates obtained for NA12891 would translate to a divergence time of 0.015 × 10, 000 × 2 × 25 = 7, 500 years (95% CI:1,500-13,500) for the Neolithic farmer and 0.0155 × 10, 000 × 2 × 25 = 7, 750 years (95% CI:4,000-11,500) for the Neolithic huntergatherers. The estimates for NA12892 are 12,050 years (95% ML CI: 6,000-18,500) for the Neolithic farmer and 5,150 years (95% ML CI: 1,500-9,000) for the Neolithic hunter-gatherers. The maximum and minimum values for each Neolithic group obtained using the two different CEU genomes are reported in the main text, together with the mean of the two point estimates. 17

If the history between the modern genome and a Neolithic individual was more complex than an instantanuous divergence model (79), the interpretation of these estimates would have to be modified accordingly. If, for example, the modern genome would be the result of admixture between the Neolithic individual and some other population, Tˆ would be a function partly of genetic drift in the two source populations since their divergence, as well as the respective fractions they contributed to the ancestry of the Neolithic individual. With this in regard, these estimaed divergence times are in line with the age of the specimens, and the similarity of the estimates for the Neolithic farmer and the Neolithic hunter gatherers in the comparison with the NA12891 genome is in line with an estimated fraction of 51.7% Neolithic farmer ancestry in the CEU population (Table S15). The difference between the two genomes, where NA12892 appear to be non-significantly closer to Neolithic huntergatherers than the Neolithic farmer, could possibly be due to different ancestry in NA12892 compared to NA12891. This difference is however within the confidence intervals of the estimates (Fig. S15). While these estimates can be inflated by sequencing errors (32), the estimated error rates for G¨ok4 are not widely different from the Neolithic hunter-gatherer data (see section 7). On the other hand, it has been shown that the ancestral effective population size influences this method when single SNPs are used to infer genealogical topologies, but this bias is small for divergence times on the order of 0.01 (32).

12.2

Divergence from genomes of East Asian and West African ancestry

In contrast to the results obtained using extant European genomes, we find that the inferred population divergence times from East Asians is significantly higher (0.12-0.18; Fig. S16; Table S12). This likely reflects both a more ancient divergence between ancestors of Europeans and East Asians as well as greater genetic drift in East Asians than Europeans (81). However, we are still unable to detect any significant difference between the Neolithic farmer G¨ok4 and the three Neolithic hunter-gatherers. We obtained similar results when comparing the Neolithic data to three genomes from individuals of West African ancestry (Fig. S16, Table S12).

13

Modelling European population history

Here, we use the observed patterns of differentiation between Neolithic farmers and huntergatherers to investigate models of population history relating the Neolithic individuals to modern populations.

13.1

Testing population topology using genealogical concordance

To build a tree-like framework relating the Neolithic individuals to modern populations, we take a similar approach as (82), testing the support of individual four taxon topologies and considering the results jointly. Consider a case where a single gene copy is sampled from

18

each of four populations A, B, C and D as for the divergence time estimation above, there are only three possible genealogical topologies, of which the one that mirrors the population topology (i.e. the ’concordant’ topology) is expected to be most frequent. By scoring which genealogical topology is supported by the configuration of alleles at each site (assuming an infinite sites model of mutation), we use a ’concordance’ test of the hypothesis that the topology (A, B), (C, D) is significantly more frequent than the topologies (A, C), (B, D) and (A, D), (B, C). We conduct this test by computing the excess of the putatively ’concordant’ category over the second most frequent category (the most frequent ’discordant’ category, denoted disc1), as C=

Nconc − Ndisc1 Nconc + Ndisc1

To obtain standard errors (SEs) for this statistic, we use a block jackknife procedure where we divide the genome in 130-140 contiguous blocks, and obtain Z-scores by normalizing with the SE. If the C-statistic is significantly positive (> 3 SEs from 0), we consider the topology supported. We use European and Levantine (Druze) populations in the HM3+FINHM+HGDP data set, and add San and Yoruba as outgroups. Most tests support the results obtained from the previous analysis, suggesting a close relation between Neolithic hunter-gatherers and Northern European populations (such as Finnish) on one hand, and Neolithic farmers and Southern European populations (such as Sardinians), on the other hand (Table S13). Based on all the 1,271 significant four-wise combinations of tests in the data (108 that included the Neolithic hunter-gatherers and 97 that included the Neolithic farmer), we constructed a neighbourjoining tree using an algorithm in the clann software package (83), and find that the results of this clustering is largely similar to the results based on PCA and ADMIXTURE analyses (Fig. S17). This approach provides a means for clustering haploid data from populations and representing the result as a topology, but we note that the neighbour-joining algorithm allows for conflicts between inferred topologies, and it utilizes information such as how many times two groups appear in the same significant concordance test (for example in tests including the outgroup) rather than explicitly testing if there is support for each node in the inferred topology. In the future, more sophisticated algorithms for constructing the supertree might allow population topologies to be investigated in a more formal setting that is directly related to population genetic models of divergence with- or without gene flow.

13.2

Gene flow between a population related to G¨ ok4 and extant European populations

While the concordance tests are designed to portray the major pattern underlying the data, human population history often comprises more complex events such as migration and admixture. The observation that a Neolithic farmer individual in Scandinavia 5,000 years ago has a similar genetic makeup as modern-day Southern Europeans suggest that it belonged to a population that expanded in concert with agriculture, and as such may have interacted and admixed with resident populations. Observing that Neolithic hunter-gatherers tend to cluster 19

in the vicinity of Finnish individuals and the Neolithic farmer clusters with individuals from southeastern Europe, we considered the possibility that other European populations in the HM3+FINHM+HGDP data set are derived from admixture between resident hunter-gatherer groups and expanding farming populations related to G¨ok4. Using the topology inferred above as a framework, we tested the hypothesis of no gene flow in the unrooted population topology (Outgroup, G¨ ok4), (X, LSF IN ), were Outgroup is either Papuans, Native Americans (Surui, Karitiana and Colombians; 21,84 ), Yoruba, or Levantine Druze. X is one of 9 European populations in the HM3+FINHM+HGDP data set. We restrict the analysis to the 7,712 SNPs not affected by putative nucleotide misincorporations. We compute the 4-population test based on the estimated allele frequencies in each population as in (31, 82). Pn

D=

i i=1 [pOutgroup (1 Pn i i=1 [pOutgroup (1

− piG¨ok4 )piX (1 − piLSF IN )] − [(1 − piOutgroup )piG¨ok4 piX (1 − piLSF IN )] − piG¨ok4 )piX (1 − piLSF IN )] + [(1 − piOutgroup )piG¨ok4 piX (1 − piLSF IN )]

where pi is the allele frequency at locus i. The statistic is summed over all n loci and we compute a standard error using a block jackknife procedure over 132 windows with 50 SNPs in each. We find evidence of a violation of the basic population topology using all outgroups (Table S14). The signal is consistently (but not always significantly) negatively skewed regardless of which outgroup is used. This suggests that gene flow from Asia into Northern Europe or from Africa into Southern Europe is not enough to explain the data, since the signal is seen across all populations when using outgroups from both these regions.

13.3

Estimation of Neolithic farmer ancestry in European populations

Based on the 4-population tests and the available data, we considered a model (Fig. S18) where the late-settlement Finnish founder population LSFIN (22) has received a negligable amount of Neolithic farmer-related gene flow since the Neolithic expansion. Secondly, we assume that G¨ok4 instantanuosly split from the ancestors of extant Druze in the Levant (Druze was chosen over Palestinians due to low degree of recent African admixture (85). Using three Native American populations with no evidence of European admixture (Surui, Karitiana and Colombians; (21, 84) as an outgroup, we estimate the fraction of G¨ok4-related ancestry in an extant European population X, as a mixture between populations related to LSFIN and G¨ok4, using the f4 -ancestry estimation framework (31, 85). Using the notation of (85), we compute the statistic Pn

f4 (Outgroup, Druze; G¨ ok4, LSF IN ) =

i i i i i=1 (pOutgroup − pDruze )(pX − pLSF IN ) Pn i i i i i=1 (pOutgroup − pDruze )(pG¨ ok4 − pLSF IN )

where pi is the allele frequency at locus i in each population, and we compute the statistic summing over all n loci. To obtain standard errors, we used a block jackknife, with 132 contiguous windows with the same number of SNPs. To also estimate ancestry in the Swedish population, we merged the HM3+FINHM+HGDP data with the Swedish data, which resulted 20

in 4,817 SNPs that could be used in the analysis. We divided the Swedish data into individuals from Southern- (743 individuals), Central- (545 individuals), and Northern Sweden (237 individuals). A number of possible model violations could influence these estimates. Most notably, gene flow from East Asia into the European populations (e.g. the Finnish, Russian, Adygei and Druze populations) in the model. To assess the robustness of the model we replaced the Native Americans used as outgroup with Papuans, JPT+CHB (Japanese and Chinese ancestry) or Uygur (Central Asian), and obtained correlations with the original estimates of 0.99, 0.98 and 0.99 respectively (Fig. S20). While the absolute values of ancestry varied with the outgroup, we found that using Native Americans was the only case when estimates did not exceed 100%.

14

The Neolithic hunter-gatherer gene pool is not fully represented in modern Europeans

To maximize the population genetic resolution of the genetic profile of the Neolithic huntergatherers, with special regard to their relationship with modern Scandinavians, we analyzed 14,113 SNPs obtained by merging the Swedish and HM3+FINHM+HGDP data, and calling genotypes based on all sequences available from the available Neolithic hunter-gatherer data (Ajv52, Ajv70 and Ire8), with the same criteria as above. To avoid the issue of a much larger sample size in the Swedish data, we randomly sampled 20 individuals from each of Southern, Central, and Northern Sweden (G¨otaland, Svealand, and Norrland; 28 ). We performed PCA and model-based clustering using ADMIXTURE as above but without Procrustes transformation or CLUMPP analysis. We find that Neolithic hunter-gatherers fall outside the variation of extant Europeans in both these analyses (Figs. S21 and S22).

21

Figure S1: The Ajvide 70 (Ajv70) individual. Photograph by Johan Norder¨ang

22

14 12 10 6

8

●●

● ●

4

● ●

2

% Human reads (megablast)





R = 0.96, P < 1e−5

● ●

2

4

6

8

10

12

14

% Human reads (Stampy)

Figure S2: Benchmarking of Stampy by comparison with megablast.

23

Gökhem (TRB, Neolithic farmers) Gotland (PWC, Neolithic hunter−gatherers) 1KGPomni+Sweden FINHM+POPRES HM3+FINHM+HGDP 1KGPomni

Figure S3: Sample locations of all data sets used in the study. Some sampling locations are represented in multiple data sets (e.g. Spain), and thus have overlapping symbols.

24

Procrustes errors

0 −20

Dimension 2

20

40

● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●●●●● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●●● ● ● ●●● ●● ●●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ●●● ● ● ●●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●●●●● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●●●● ●●●●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ●● ● ●●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ●● ●●● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●

−40

−20

● ●





0

● ● ●● ●●●●● ● ● ● ●●● ● ●● ● ●●●●● ● ● ●● ● ● ● ●● ● ● ●●● ●● ●●●●●● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●●● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

20

40

60

Dimension 1

Figure S4: Procrustes transformation of (PC1,PC2) coordinates from two different data sets. Arrows represent the residuals between reference (”target”) and the coordinates for rotation after transformation.

25

C

Merged

−0.04 −0.06

Africa −0.01

0.01

0.03

−0.03

−0.01

PC1

D

E

0.03

−0.02

F

Gök4

0.02

0.04

0.02

0.04

0.02

Ire8

0.00 PC2

−0.04

−0.04

−0.02

PC2

0.00

0.00 −0.02

0.00 PC1

0.02

0.02

Ajv70

−0.06

−0.04

PC2

0.01 PC1

−0.02

−0.03

−0.02

PC2

0.00

0.00

−0.04

−0.04

−0.02

PC2

0.00

C/S Asia Europe Middle East

Ajv52

0.02

0.02

East America Asia Oceania

PC2

B

Modern individuals

−0.02

0.02

A

−0.03

−0.01

0.01

0.03

PC1

−0.02

0.00 PC1

0.02

0.04

−0.04

−0.02

0.00 PC1

Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S5: PCA and Procrustes analysis using individuals sampled worldwide in the HM3+FINHM+HGDP data set. A) PCA of modern individuals only. B) Procrustestransformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

26

C

Merged

FIN

FIN

0.05 PC1

NorthItalian TSI Adygei

NorthItalian TSI Adygei

Sardinian

−0.05

Sardinian

−0.05

−0.05

Russian LSFIN Orcadian Basque French

Middle East

Middle East

−0.10

Middle East

FIN

Russian LSFIN Orcadian Basque French

0.05 PC1

NorthItalian TSI Adygei

Sardinian

0.00

Russian LSFIN Orcadian French Basque

Ajv52

0.10

0.10

0.10 0.05 PC1

0.00

B

Modern individuals

0.00

A

0.00

0.10

0.20

−0.15

−0.10

−0.05

PC2

D

0.05

0.10

−0.15

F

Gök4

Adygei

0.05

Adygei

Middle East

−0.05

0.00

0.05

PC2

0.10

FIN Russian LSFIN Orcadian Basque French NorthItalian TSI Adygei Sardinian

Middle East

−0.10

−0.10

−0.10 −0.15

Ire8

−0.05

−0.05

Sardinian

Middle East

0.05 0.10 0.15

0.10 NorthItalian TSI

PC1

PC1

NorthItalian TSI Sardinian

FIN Russian LSFIN Orcadian Basque French

0.00

0.05

FIN

−0.05 PC2

0.10

0.10 0.05 PC1

E

Ajv70

LSFIN Russian Orcadian BasqueFrench

−0.05

0.00

0.00

PC2

0.00

−0.10

−0.15

−0.10

−0.05 PC2

0.00

0.05

−0.15

−0.05

0.00

0.05

0.10

PC2

Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S6: PCA and Procrustes analysis using individuals sampled from Europe and the Levant in the HM3+FINHM+HGDP data set. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustestransformed PCAs of each Neolithic individual separately.

27

B Finland (LS)

−0.04

0.00

0.04

D

0.00 −0.04

Cyprus

0.08

−0.02

PC2

0.00

0.02

0.04

−0.05

E

F

Gök4

Sweden Russia Ireland Germany Ukraine Denmark Hungary

0.06

PC2

−0.02

Albania FranceRomania Spain Italy Greece Turkey Cyprus

−0.04

0.00 PC2

0.02

0.04

Finland (LS) Finland Latvia Sweden Ukraine Russia Germany Ireland Denmark Romania Hungary France Spain Turkey Italy Albania Greece

0.00

0.00

Latvia

PC1

PC1

Finland

−0.04 0.02

Ire8

0.04

0.04

0.04 PC1

−0.04

0.00

Latvia Denmark Sweden Ukraine Russia Ireland Germany Hungary FranceRomania Spain Albania Italy Turkey Greece Cyprus

0.10

0.06

0.08

Finland (LS) Finland

0.05 PC2

0.08

Ajv70

−0.02

0.00

PC2

Finland (LS)

−0.06

Latvia Denmark Ukraine Sweden Russia Ireland Germany Hungary France Romania Spain AlbaniaTurkey Italy Greece Cyprus

PC1

PC1

Latvia Sweden Denmark Russia Ukraine Ireland Germany Hungary France Romania Spain Albania Italy Greece Turkey

Finland

0.02

0.05

Cyprus

0.04

−0.04

Latvia Sweden Russia Ukraine Denmark Ireland Germany Hungary France Romania Spain Albania Turkey Italy Greece

Ajv52

Finland (LS) Finland

0.04 0.02 0.00

PC1

−0.05

Finland

0.00

C

Merged 0.08

Modern individuals

−0.10

A

0.06

Cyprus

−0.03

−0.01

0.01

0.03

PC2

Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S7: PCA and Procrustes analysis using individuals in the FINHM+POPRES data set. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

28

0.00

PC1

PC1 0.00 −0.05

0.00

0.05

0.05

0.05 −0.10

−0.10

PC2

−0.05

0.00

0.05

−0.10

E

F

Gök4

0.05

PC2

0.10

0.10

Ire8

0.05

0.05 0.00

0.05

−0.05

PC1 0.00

PC1

−0.05

−0.10 −0.05 PC1 0.00 0.05

−0.05

0.00 PC2

−0.10

Ajv70

−0.10

−0.05

PC2

−0.10

D

Ajv52

−0.05

−0.10 −0.05

−0.10 −0.05 0.05

0.00

PC1

C

Merged −0.10

B

Modern individuals

0.00

A

−0.15

−0.05 0.00 PC2

0.05

0.10

−0.10

−0.05

0.00

0.05

0.10

PC2

FIN (Finland) GBR+CEU (Gr. Britain/Utah) IBS (Spain) TSI (Italy) Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S8: PCA and Procrustes analysis using individuals sampled from Europe in the 1KGPomni data set. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

29

A

B

Ajv52

−0.02

IBS TSI

PC1

PC1

0.06

0.06

0.02

0.00

Northern Sweden Central Sweden Southern Sweden

FIN Northern Sweden Central Sweden Southern Sweden CEU GBR

0.02

−0.02

−0.06 −0.02

Northern Sweden Central Sweden Southern Sweden

0.02

PC1

C

Merged

−0.04

Modern individuals

−0.02

0.02

0.06

0.10

−0.02 −0.01

PC2

0.01

0.02

0.03

−0.04

−0.02

0.00

PC2

E

Ajv70

0.02

0.04

PC2

F

Gök4

Ire8

−0.06

−0.02

0.02 PC2

0.06

0.00 −0.01 0.01

Northern Sweden Central Sweden Southern Sweden

0.02

0.04

0.06

0.02

PC1

Northern Sweden Central Sweden Southern Sweden

0.00

PC1

Northern Sweden Central Sweden Southern Sweden

0.02

PC1

−0.02

−0.02

−0.03

D

0.00

−0.04

−0.02

0.00 PC2

0.02

0.04

−0.02 −0.01

0.00

0.01

0.02

0.03

PC2

FIN (Finland) GBR+CEU (Gr. Britain/Utah) IBS (Spain) TSI (Italy) Northern Sweden Central Sweden Southern Sweden Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S9: PCA and Procrustes analysis using individuals sampled from Europe in the 1KGPomni+Sweden data set. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately. Centroids for Southern-, Central and Northern Sweden are indicated.

30

0.04

East Asia

P H ●

0.02

America

0.00

PC1

Oceania

−0.02

Central/South Asia ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● Europe ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

−0.15

−0.10

−0.05

0.00

PC2

Figure S10: PCA and Procrustes analysis using data from two libraries prepared from a paleo-eskimo hair sample (Saqqaq; 11,72 ) using Phusion (denoted ”P”) and Hifi polymerase (denoted ”H”) comprising 38, 107 and 23, 587 SNPs, respectively, together with non-African individuals in the HM3+FINHM+HGDP data set.

31

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

K=2 Ire8 (hunter-gatherer)

Gok4 (farmer)

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

K=3 Ire8 (hunter-gatherer)

Gok4 (farmer)

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

K=4 Ire8 (hunter-gatherer)

Druze

Adygei

Tuscan

Sardinian

Italy (TSI)

North Italian

Basque

French

Orcadian

NW European ancestry (CEU)

Russian

Finland

Finland (LS)

Gok4 (farmer)

Figure S11: Model-based clustering of Neolithic and extant Europeans (plus the levantine Druze) assuming 2, 3 and 4 clusters

32

2

4

6

8

10

12

14

5

10

0.10 0.00 4

6

8

10

12

14

A T G C

0.3 0.1 0.0 −5

0

0

5

10

2

4

6

8

10

12

14

H

0.2

0.3 0.2 0.1 0

2

0.5

0.5

A T G C

0.0 −5

0

G

0.4

0.5 0.0

0.1

0.2

0.3

0.4

A T G C

0.30

0.30 0.20 0.00

0

F

0.5

14

A T G C

0.4

12

0.3

10

0.2

8

A T G

0.1

6

Gök4

0.0

4

A T G

0.4

2

D

Ire8

0.10

0.20

0.30

A T G

0.10 0

E Base frequency in reference

C

Ajv70

0.00

0.10

0.20

0.30

A T G

0.20

B

Ajv52

0.00

Base frequency in read given C in reference

A

−5

0

5

10

−5

0

5

10

Distance from 5' end of sequence read

Figure S12: Patterns of nucleotide misincorporation and fragmentation in libraries prepared from Neolithic samples and amplified by Hifi polymerase. A to D shows average base frequency in sequence reads at sites where a C is found in the reference sequence (hg18) for Ajv52, Ajv70, Ire8 and G¨ok4, respectively. E to H shows base frequencies in the reference around the 5’-end of the sequence reads.

33

B

0.2

0.4

−0.04 −0.04

0.00 0.02 0.04 0.06

PC2

D

−0.05

PC2

E

F

Gök4

0.10

0.15

PC2

0.20

0.10

−0.05

Ire8

0.10

0.08 0.05

0.05

0.05

PC1

0.00

PC1

0.04

0.00

0.00

0.00 PC2

−0.04

−0.05

Ajv70

0.05 −0.10

Ajv52

0.06 0.04 0.02 0.00

PC1

PC1 0.0

0.06 0.04 0.02 0.00

−0.05 0.00

PC1

0.05 0.10

−0.2

PC1

C

Merged

−0.04

Modern individuals

0.00

A

−0.05

0.00

0.05 PC2

0.10

−0.10

−0.05

0.00

0.05

PC2

FIN (Finland) TSI (Italy) Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S13: PCA and Procrustes analysis using individuals of Italian (TSI) and Finnish (FIN) ancestry in the 1KGPomni data set and Neolithic sequences with evidence of cytosine deamination in the first 10 bp. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

34

B

0.2

−0.06

0.4

−0.05

PC2

0.00

0.05

−0.10

E

F

Gök4

0.00

0.05

PC2

0.10

0.10

Ire8

0.06

0.06 −0.05

0.05

0.02

0.02

PC1

−0.02

PC1

−0.04 0.00 0.02 0.04

−0.10

0.00 PC2

−0.06

Ajv70

−0.15

−0.05

PC2

−0.06

D

−0.02 0.06

0.02 0.06 0.0

Ajv52

0.02

PC1

PC1

−0.02

−0.05 0.00

PC1

0.05 0.10

−0.2

PC1

C

Merged

−0.06

Modern individuals

−0.02

A

−0.15

−0.05

0.00

PC2

0.05

0.10

−0.10

−0.05

0.00

0.05

0.10

PC2

FIN (Finland) TSI (Italy) Neolithic hunter−gatherer: Ajv52 Ajv70 Ire8 Neolithic farmer: Gök4

Figure S14: PCA and Procrustes analysis using individuals of Italian (TSI) and Finnish (FIN) ancestry in the 1KGPomni data set and Neolithic sequences without evidence of cytosine deamination in the first 10 bp. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

35

NA12891 (CEU)

0.0 −2.0

−1.0

Neolithic hunter−gatherers (PWC) Pooled Neolithic hunter−gatherers Neolithic farmer (TRB)

−3.0

Relative log−likelihood

A

0.00

0.02

0.04

0.06

0.08

0.10

Divergence time T (2Ne generations)

NA12892 (CEU)

0.0 −2.0

−1.0

Neolithic hunter−gatherers (PWC) Pooled Neolithic hunter−gatherers Neolithic farmer (TRB)

−3.0

Relative log−likelihood

B

0.00

0.02

0.04

0.06

0.08

0.10

Divergence time T (2Ne generations)

Figure S15: Relative log-likelihood curves for population divergence time Tˆ between Neolithic individuals and complete genomes of European descent.

36

A

B 0.0 −1.0 −3.0

0.00

0.05

0.10

0.15

0.20

0.00

Divergence time T (2Ne generations)

C

0.05

0.10

0.15

0.20

Divergence time T (2Ne generations)

D

−2.0

−1.0

0.0

NA19239 (YRI)

−3.0

−2.0

−1.0

Relative log−likelihood

0.0

NA19238 (YRI)

−3.0

Relative log−likelihood

SJK (Korean)

−2.0

Relative log−likelihood

−1.0 −2.0 −3.0

Relative log−likelihood

0.0

YH (Chinese)

0.00

0.05

0.10

0.15

0.20

0.00

Divergence time T (2Ne generations)

E

0.05

0.10 Divergence time T (2Ne generations)

0.15

0.20

−1.0 −2.0

Neolithic hunter−gatherers (PWC) Neolithic farmer (TRB)

−3.0

Relative log−likelihood

0.0

NA18507 (YRI)

0.00

0.05

0.10 Divergence time T (2Ne generations)

0.15

0.20

Figure S16: Relative log-likelihood curves for population divergence time Tˆ between Neolithic individuals and complete genomes from East Asian and African individuals.

37

LSFIN

FIN

Neolithic hunter−gatherers

Russian Orcadian CEU French FrenchBasque

NorthItalian Sardinian San Yoruba

TSI Druze

Neolithic farmer

Figure S17: Neighbour-joining supertree of 1,271 four-taxa topologies inferred by concordance tests.

38

Gök4 Colombian Karitiana Surui

LSFIN

Population X

Druze

Figure S18: Illustration of the model used for estimating the fraction of Neolithic farmer ancestry in extant European populations.

39

Figure S19: Estimated Neolithic farmer ancestry (red) in European populations.

40

B

R = 0.99, P < 1e−13

0.4

2

0.8

3 2 Adygei 1

1.0

French CEU Orcadian

Adygei 3 2 1

0.2

0.2

1

Neolithic farmer ancestry (Outgroup: Uygur)

Adygei 3

French CEU Orcadian

NorthItalian TSI Tuscan FrenchBasque

0.8

1.0

0.6

CEU French

NorthItalian TSI

0.6

Tuscan FrenchBasque Orcadian

Sardinian

1.2

Sardinian

FrenchBasque Tuscan

0.4

Neolithic farmer ancestry (Outgroup: JPT+CHB)

1.0 0.8

TSI NorthItalian

0.2

Neolithic farmer ancestry (Outgroup: Papuans)

C R = 0.98, P < 1e−8

Sardinian

0.6

R = 0.99, P < 1e−10

0.4

A

FIN

FIN FIN Russian

Russian

0.2

0.4

0.6

0.8

Neolithic farmer ancestry (Outgroup: Americans)

1.0

Russian

0.2

0.4

0.6

0.8

Neolithic farmer ancestry (Outgroup: Americans)

1.0

0.2

0.4

0.6

0.8

1.0

Neolithic farmer ancestry (Outgroup: Americans)

Figure S20: Substituting Native Americans as an outgroup yields highly correlated estimates of Neolithic ancestry in European populations. The numbers 1, 2 and 3 in the plots refer to Northern, Central, and Southern Sweden, respectively. In all panels, estimates using Native Americans are displayed on the x-axis, and estimates using a different outgroup are on the y-axis. Alternative outgroups: A) Papuans, B) JPT+CHB (Japanese and Chinese), and C) Uygur (Central Asia).

41

Druze

0.10 0.05 0.00 −0.10

−0.05

PC1

● ● ●● ● ●● ● ● ● ●● ● ● ●● ●● ●●● ●● ● ● ●●● ●● ●● ● ● ●● ● ● ●● ● Neolithic Finland ● ● ● ●●● ● hunter−gatherers ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ●● ●● ● ●●● ● ●Russian ● ●●●● ● ● ● ● ● ● ● ● Northern Sweden ● ● ●● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● Southern/Central ●●●● ●●Sweden ● ●● ● ● ●● ●●● ● ● ● ● ● Orcadian ● ●●●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ●●● ● ● ●● French ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● Basque ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ●Italian ●●● ● ●● ● ● ●● ● ●● North ● ● ● ● ●● ● ● ● ● ● Adygei ● ● ● ● ● ● ● ●●● ●●●● ● ●● ● ● ●● ●● ●● ● ● ● ● Tuscans ●●● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ●● ● Sardinian ●● ● ● ● ● ● ●● ●

−0.10

−0.05

0.00

0.05

0.10

0.15

PC2

Figure S21: Principal component analysis of combined data (14,113 SNPs) from the three Neolithic hunter-gatherers (Ajv52, Ajv70 and Ire8) and extant Europeans (including Swedes) and the Levantine Druze.

42

K=2

K=3

Druze

Adygei

Sardinian

Tuscan

Italy (TSI)

North Italian

Basque

French

Orcadian

NW European ancestry (CEU)

Southern Sweden

Central Sweden

Northern Sweden

Russian

Finland

Finland (LS)

GRK (hunter-gatherer)

K=4

Figure S22: Model-based clustering of combined data (14,113 SNPs) from the three Neolithic hunter-gatherers (Ajv52, Ajv70 and Ire8) and extant Europeans (including Swedes) and the Levantine Druze assuming 2, 3 and 4 clusters.

43

Table S1: Summary of samples. *Pagophilus groenlandicus, # Canis familiaris. Sample Species Context Age (years) Sex Locality Ajv52 Human PWC ca. 7 Ajvide Ajv70 ” PWC 20-25 Male Ajvide Ire8 ” PWC 40-50 Female Ire G¨ok4 ” TRB ca. 20 Female G¨okhem 94:1 Ste7 ” TRB G¨okhem 94:1 GSAj15 Harp seal* Ajvide # CFBe3a Domestic dog Link¨oping

44

Table S2: Index sequences Sample Index Ajv52 TGGTCA Ajv70 ATTGGC Ire8 GATCTG G¨ok4 GCCTAA Ste7 ACATCG GSAj15 CACTGT CFBe3a CGTGAT

45

Table S3: Summary of initial quality control and mapping. In the main text we report fraction of human reads based on Phusion-amplified libraries to allow direct comparison with the negative animal controls. Sample Indexed reads Hits to hg18 (%) Hits after rmdup (%) Phusion Ajv52 r1 Ajv70 r1 G¨ok4 r1 Ire8 r1 Ste7 r1 GSAj15 CFBe3a

22,799,659 19,842,862 10,409,532 15,582,175 12,811,226 6,033,355 12,278,562

766,058 (3.36%) 1, 375,902 (6.93%) 1,464,876 (14.07%) 842, 520 (5.41%) 96,619 (0.75%) 16,430 (0.27%) 28,697 (0.23%)

543,568 (2.38%) 962,367 (4.85%) 656,564 (6.31%) 422,951 (2.71%) 45,661 (0.36%) 1,172 (0.02%) 14,954 (0.12%)

Hifi Ajv52 r2 Ajv70 r2 G¨ok4 r2 Ire8 r2 Ste7 r2

18,768,393 20,367,126 19,460,817 17,032,858 18,633,041

1,046,737 (5.58%) 1,487,263 (7.30%) 2,121,404 (10.90%) 720,100 (4.23%) 223,274 (1.20%)

862,396 (4.59%) 1,309,407 (6.43%) 1,059,387 (5.44%) 564,792 (3.32%) 68,207 (0.37%)

Ste7 Ste7 Ste7 Ste7 Ste7 Ste7

13,079,481 13,724,774 13,761,976 13,678,842 13,685,086 13,531,935

r3 r4 r5 r6 r7 r8

Total Ajv52 Ajv70 G¨ok4 Ire8 Ste7

41,568,052 40,209,988 29,870,349 32,615,033 112,906,361

190,104 205,805 205,213 202,368 205,167 204,339

(1.45%) (1.50%) (1.49%) (1.48%) (1.50%) (1.51%)

1,812,795 (4.36%) 2,863,165 (7.12%) 3,586,280 (12.01%) 1,562,620 (4.79%) 318,710 (0.28%)

46

30,048 26,147 24,691 24,164 24,483 24,323

(0.23%) (0.19%) (0.18%) (0.18%) (0.18%) (0.18%)

1,405,964 (3.38%) 2,271,774 (5.65%) 1,715,951 (5.74%) 987,743 (3.03%) 173,849 (0.15%)

Sample Ajv52 Ajv70 Ire8 G¨ok4 Ste7

Reads 4,030 3,752 3,027 1,183 3,892

Table S4: Summary of Mean depth rCRS coverage 14.0 16,545 13.8 16,541 9.8 16,553 4.2 16,037 14.2 16,551

47

mtDNA assembly. % rCRS Total bases 99.86% 232,332 99.83% 227,824 99.90% 162,753 96.79% 67,861 99.89% 235,119

Mean length 57.7 60.7 53.8 57.4 60.4

48

Ste7

T

93G 6755A 7978T 9192A 10088T 11337G 15049T 16093C 16519C

195C 2405.1C 4646C 8860G*

4769G 8860G*

93G 4394T 4577T 4616T 6083T 7424G 10088T 16093C 16519C

195C 2405.1C 8860G*/195C 8818T 8860G*

sequence data were available are indicated with a *. Absent Hg defining SNPs Other diagnostic SNPs 4769G 8860G* -

298T 4814T 6086C 8345T 11071T 16145 16167T 16519C T2b3/T2b3a 73G 151T 263G 709A 750G 930A 4769G 5147A* 8860G 1473T 8838A 8856A 1438G 1888A 2706G 4216C 4917G 16296/152C 4769G 5147A* 16519C 7028T 8697A 10463C 10750G 5656G 8860G 16292T 11251G 11719A 11812G 13368A 16296T 14233G 14766T 14905A 15326G 15452A 15607G 15928A 16126C 16294T 16304C

Table S5: Summary of haplogroup (Hg) defining polymorphisms. Positions were no Sample Hg in ref. 8 (d-loop) New Hg Present Hg defining SNPs Ajv52 V V 72C 263G 750G 1438G 2706G 4580A 7028T 15326G 15904T 16298C Ajv70 U4/H1b U4d/U4a2 73G 263G T310C 499A 629C 750G 1438G 1811G 2706G 4646C 4769G 5999C 6047G 7028T 11332T 11467G 11719A 12308G 12372A 14620T 14766T 15326G 15693C 16356C Ire8 U4/H1b U4d 73G 263G 499A 629C 750G 1438G 1811G 2706G 4769G 5999C 6047G 7028T 11332T 11467G 11719A 12308G 12372A 14620T 14766T 15326G 15693C 16356C G¨ok4 H H 263G 750G 1438G 15326G

Table S6: Summary of mtDNA polymorphisms not informative for haplogroup assignment. Sample Non-diagnostic SNPs Ajv52 1224T 1443C 1449A 1933T 4776A 5228T 5763A 5882T 5890T 6880T 7095T 7098T 7380A 7382A 7422A 7588A 7743T 9693T 14622T Ajv70 4398T 4585T 4590G 4599T 4604T 4617T 4624T 4633T 4636T 6869T Ire8 4T 17T 18T 29T 33T 1224T 1226T 1315A 2174A 2182A 2185A 4153A 4189T 5211T 5974T 6075A 6181T 6993A 7412T 7427T 7974T 8351T 8375T 8998A 10213T 13860T G¨ok4 320T 791A 1084T 1108T 1223T 1334A 1472A 1601T 1918T 2956G 2981T 3835T 3868C 4003C 4092A 4220C 4656A 4724T 4848A 4849A 5172A 5245C 5490T 5491T 5540A 5671A 5679A 5688A 5693C 5784T 5818T 5871A 5899T 5901T 6049A 6368T 6402G 6421T 6434T 6447T 6448T 6449T 6463T 6568T 6573A 6576A 6884T 7090A 7108A 7183G 7185G 7618T 7627T 7628T 7633T 7792T 7794T 7801T 8243A 8346T 8352T 8610T 8719A 8923G 8941T 9030T 9693T 10255G 11073T 11074T 11219T 12110T 12264A 12267T 13431T 13436T 13455T 13458T 13464T 13625T 14528T 14532T 14533T 14535T 14685A 14771T 15434T 15436T 15437A 16369A 16379T 16380T Ste7 1951T 4358A 4488T 4521T 4776A 6081A 6087A 6156T 6753A 6754A 7367T 7369T 7370T 7382A 7462T 7470T 7957T 7958T 8812C 9693T 13347T

49

Table S7: Triallelic site frequency in genome sequence data. Total number of SNPs Neolithic data has Neolithic data has Sample (deamination excluded) third allele (count) third allele (%) Ajv52 94,650 2,276 2.46% 153,973 3,833 2.55% Ajv70 Ire8 49,744 2,248 4.73% G¨ok4 105,274 2,929 2.86% Ste7 8,794 1,303 17.39%

50

Table S8: Triallelic site frequency in genotype array data. Total number of SNPs Neolithic data has Neolithic data has Sample (deamination excluded) third allele (count) third allele (%) Ajv52 26,495 435 1.61% 44,788 837 1.83% Ajv70 Ire8 11,763 279 2.32% G¨ok4 27,910 763 2.66% Ste7 1,348 68 4.80%

51

Table S9: Number of SNPs called using sequences with (’Degraded’) and without (’Not degraded’) evidence of cytosine deamination in the the 5’ end. Sample Degraded Not degraded Ajv52 2,551 4,870 3,070 8,946 Ajv70 Ire8 686 1,532 G¨ok4 1,273 5,926

52

Table S10: Allele sharing between Neolithic- and extant European populations. Sorted according to allele sharing with Neolithic farmer. Population Label Sharing with Sharing with Neolithic Number of in Fig. 3C Neolithic farmer hunter-gatherers Long Lat chr. Cyprus Cyp 68.20% 68.21% 33 35 8 Greece Gre 67.94% 68.51% 22 39 16 France Fra 67.89% 68.80% 2 46 178 Netherlands Net 67.88% 68.79% 5 52 34 Romania Rom 67.84% 68.62% 25 46 28 Italy Ita 67.81% 68.43% 12 42 438 Germany Ger 67.80% 68.80% 10 51 142 Croatia Cro 67.76% 68.67% 15 45 16 Portugal Por 67.75% 68.59% -8 39 256 Belgium Bel 67.73% 68.78% 4 50 86 Spain Spa 67.72% 68.59% -4 40 272 Poland Pol 67.71% 68.98% 20 52 44 Austria Aus 67.69% 68.65% 13 47 28 United Kingdom UK 67.68% 68.79% -2 53 400 Serbia Ser 67.67% 68.62% 20 44 88 Macedonia Mac 67.62% 68.58% 22 41 8 Sweden Swe 67.61% 68.84% 15 62 20 Ireland Ire 67.61% 68.71% -8 53 122 Hungary Hun 67.60% 68.58% 20 47 38 Russian Rus 67.56% 68.72% 37 55 12 Turkey Tur 67.55% 67.98% 35 39 8 FIN FIN 67.47% 68.77% 25 61 80 LSFIN LSFIN 67.44% 68.79% 26 64 162 Bosnia Bos 67.39% 68.81% 17 44 18 Scotland Sco 67.35% 68.81% -4 56 10

53

Table S11: Population divergence time of Neolithic individuals from modern Northwest European ancestry (CEU). NA12891 NA12892 ˆ Sample Nconc Ndisc T Nconc Ndisc Ajv52 3,986 7,549 0.0185 3,843 7,469 Ajv70 6,132 11,727 0.0151 6,075 11,750 2,016 3,914 0.0100 1,960 3,840 Ire8 G¨ok4 4,324 8,274 0.0150 4,362 8,129 PWC (Ajv52+Ajv70+Ire8) 11,843 22,628 0.0155 11,608 22,514

54

genomes of

Tˆ 0.0096 0.0113 0.0069 0.0241 0.0103

55

Table S12: Population divergence (YRI). YH Nconc Ndisc Tˆ Ajv52 4,918 6,642 0.1490 Ajv70 7,761 10,270 0.1574 Ire8 2,410 3,434 0.1262 G¨ok4 5,556 7,226 0.1649 PWC 14,724 19,854 0.1493 Nconc 4,908 7,834 2,481 5,378 14,861

SJK Ndisc 6,381 9,903 3,237 6,951 19,055 Tˆ Nconc 0.1650 4,252 0.1773 6,826 0.1635 2,182 0.1676 4,653 0.1710 12,921

NA19238 Ndisc 7,496 11,487 3,793 8,001 22,244

(YRI) NA19239 (YRI) NA18507 (YRI) ˆ ˆ T Nconc Ndisc T Nconc Ndisc Tˆ 0.0438 4,187 7,773 0.0254 4,177 8,066 0.0118 0.0609 6,750 12,244 0.0336 6,521 12,349 0.0185 0.0459 2,113 3,994 0.0192 2,121 4,197 0.0036 0.0529 4,730 8,577 0.0337 4,676 8,723 0.0237 0.0525 12,721 23,397 0.0287 12,524 23,987 0.0146

time of Neolithic individuals from modern genomes of East Asian ancestry and West African ancestry

Table S13: Examples of concordance tests relating the Neolithic individuals with extant human populations. (A, B), (C, D) (Sardinian,G¨ok4),(French,TSI) (Yoruba,Druze),(G¨ok4,LSFIN) (Druze,G¨ok4),(CEU,FIN) (FrenchBasque,G¨ok4),(CEU,Russian) (TSI,G¨ok4),(Orcadian,FIN)

Nconc 564 598 556 532 578

Ndisc1 493 500 476 457 477

Ndisc2 437 498 475 457 474

C 0.067 0.089 0.078 0.076 0.096

SE 0.028 0.025 0.024 0.023 0.027

Z 2.43 3.50 3.20 3.30 3.55

(Druze,Sardinian),(Russian,Neolithic HGs) 1,609 (Druze,TSI),(LSFIN,Neolithic HGs) 1,637 (Druze,Sardinian),(LSFIN,Neolithic HGs) 1,591 (Druze,Sardinian),(Orcadian,Neolithic HGs) 1,599 (Druze,Sardinian),(FIN,Neolithic HGs) 1,670

1,439 1,486 1,426 1,443 1,487

1,399 1,486 1,376 1,406 1,436

0.055 0.048 0.055 0.051 0.058

0.016 0.015 0.015 0.017 0.016

3.47 3.20 3.66 3.10 3.73

56

Table S14: 4-population tests for admixture with G¨ok4 assuming four different outgroups (O). Tests with |Z| > 3 are highlighted in bold. Pop X Z(O = Yoruba) Z(O = N. Am.) Z(O = Papuan) Z(O = Druze) Sardinian -5.74 -4.40 -4.50 -1.15 TSI+Tuscan -5.03 -3.33 -3.66 -0.64 North Italian -5.40 -3.58 -3.80 -0.67 French Basque -5.53 -3.46 -3.74 -1.13 French -5.30 -3.21 -3.53 -1.13 CEU -5.35 -3.12 -3.64 -1.22 Orcadian -4.97 -2.61 -3.39 -0.82 Russian -4.78 -1.66 -2.51 -0.84 FIN -4.84 -1.91 -2.70 -1.00

57

Table S15: Estimated Neolithic farmer ancestry in European populations. Population G¨ok4-related ancestry (c) Standard error Sardinian 95.3% 13.4% North Italian 78.9% 12.1% 77.4% 11.1% TSI Tuscan 69.8% 11.6% French Basque 65.0% 10.2% French 55.3% 8.5% 51.7% 7.8% CEU 46.9% 7.8% Orcadian Adygei 46.4% 7.7% 41.1% 7.8% Southern Sweden Central Sweden 35.8% 7.1% 31.1% 6.3% Northern Sweden FIN 14.0% 3.4% Russian 11.3% 4.1%

58

References and Notes 1. G. V. Childe, The Dawn of European Civilization (Kegan, Paul, Trench & Trubner. London, 1925). 2. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Synthetic maps of human gene frequencies in Europeans. Science 201, 786 (1978). doi:10.1126/science.356262 Medline 3. L. Cavalli-Sforza, P. Menozzi, A. Piazza, The History and Geography of Human Genes (Princeton Univ. Press, Princeton, NJ, 1994). 4. J. Novembre, S. Ramachandran, Perspectives on human population structure at the cusp of the sequencing era. Annu. Rev. Genomics Hum. Genet. 12, 245 (2011). doi:10.1146/annurevgenom-090810-183123 Medline 5. C. Renfrew, P. G. Bahn, Archaeology: Theories, Methods, and Practice (Thames and Hudson, New York, 1991). 6. M. Stoneking, J. Krause, Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 12, 603 (2011). doi:10.1038/nrg3029 Medline 7. B. Bramanti et al., Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. Science 326, 137 (2009). doi:10.1126/science.1176869 Medline 8. H. Malmström et al., Ancient DNA reveals lack of continuity between neolithic hunter-gatherers and contemporary Scandinavians. Curr. Biol. 19, 1758 (2009). doi:10.1016/j.cub.2009.09.017 Medline 9. W. Haak et al.; Members of the Genographic Consortium, Ancient DNA from European early neolithic farmers reveals their near eastern affinities. PLoS Biol. 8, e1000536 (2010). doi:10.1371/journal.pbio.1000536 Medline 10. M. Lacan et al., Ancient DNA reveals male diffusion through the Neolithic Mediterranean route. Proc. Natl. Acad. Sci. U.S.A. 108, 9788 (2011). doi:10.1073/pnas.1100723108 Medline 11. M. Rasmussen et al., Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757 (2010). doi:10.1038/nature08835 Medline 12. M. Malmer, The Neolithic of South Sweden: TRB, GRK, and STR (Royal Swedish Academy of Letters History and Antiquities; Almquist & Wiksell International, Stockholm, Sweden, 2002). 13. H. Malmström et al., More on contamination: The use of asymmetric molecular behavior to identify authentic ancient human DNA. Mol. Biol. Evol. 24, 998 (2007). doi:10.1093/molbev/msm015 Medline 14. H. Malmström et al., High frequency of lactose intolerance in a prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol. 10, 89 (2010). doi:10.1186/1471-2148-10-89 Medline 15. K.-G. Sjögren, Megaliths and Identities, paper presented at the third European Megalithic Studies Group Meeting, 13 to 15 May 2010, Kiel University, Germany, M. Furholt, F. Lüth, J. Müller, C. Scarre, Eds.; www.jungsteinsite.de. 16. See supplementary materials on Science Online. 17. K.-G. Sjögren, T. Price, T. Ahlström, Megaliths and mobility in south-western Sweden. Investigating relationships between a local society and its neighbours using strontium isotopes. J. Anthropol. Archaeol. 28, 85 (2009). doi:10.1016/j.jaa.2008.10.001

18. G. Lunter, M. Goodson, Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936 (2011). doi:10.1101/gr.111120.110 Medline 19. A. W. Briggs et al., Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. U.S.A. 104, 14616 (2007). doi:10.1073/pnas.0704665104 Medline 20. N. Patterson, A. L. Price, D. Reich, Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006). doi:10.1371/journal.pgen.0020190 Medline 21. J. Z. Li et al., Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100 (2008). doi:10.1126/science.1153717 Medline 22. I. Surakka et al., Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging. Genome Res. 20, 1344 (2010). doi:10.1101/gr.106534.110 Medline 23. D. M. Altshuler et al.; International HapMap 3 Consortium, Integrating common and rare genetic variation in diverse human populations. Nature 467, 52 (2010). doi:10.1038/nature09298 Medline 24. M. R. Nelson et al., The Population Reference Sample, POPRES: A resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347 (2008). doi:10.1016/j.ajhg.2008.08.005 Medline 25. R. M. Durbin et al.; 1000 Genomes Project Consortium, A map of human genome variation from population-scale sequencing. Nature 467, 1061 (2010). doi:10.1038/nature09534 Medline 26. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655 (2009). doi:10.1101/gr.094052.109 Medline 27. M. Jakobsson, N. A. Rosenberg, CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801 (2007). doi:10.1093/bioinformatics/btm233 Medline 28. E. Salmela et al., Swedish population substructure revealed by genome-wide single nucleotide polymorphism data. PLoS ONE 6, e16747 (2011). doi:10.1371/journal.pone.0016747 Medline 29. J. Krause et al., A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr. Biol. 20, 231 (2010). doi:10.1016/j.cub.2009.11.068 Medline 30. L. Orlando et al., True single-molecule DNA sequencing of a pleistocene horse bone. Genome Res. 21, 1705 (2011). doi:10.1101/gr.122747.111 Medline 31. D. Reich, K. Thangaraj, N. Patterson, A. L. Price, L. Singh, Reconstructing Indian population history. Nature 461, 489 (2009). doi:10.1038/nature08365 Medline 32. P. Skoglund, A. Götherström, M. Jakobsson, Estimation of population divergence times from nonoverlapping genomic sequences: Examples from dogs and wolves. Mol. Biol. Evol. 28, 1505 (2011). doi:10.1093/molbev/msq342 Medline 33. A. Fischer, Food for Feasting? An Evaluation of Explanations of the Neolithisation of Denmark and Southern Sweden, no. 12 in Sheffield Archaeological Monographs (J. R. Collis Publication, Sheffield, UK, 2002). 34. F. Hallgren, Identitet i praktik: lokala, regionala och överregionala sociala sammanhang inom nordlig trattbägarkultur, no. 17 in Coast to Coast-Book (Department of Archaeology and Ancient History, Uppsala University, Uppsala, Sweden, 2008).

35. M. Midgley, The Megaliths of Northern Europe (Routledge, New York, 2008). 36. N. Stenbäck, Människorna vid havet: platser och keramik på Ålands-öarna perioden 3500-2000 f.Kr., Stockholm studies in archaeology 28 (Department of Archaeology, Stockholm University, Stockholm, Sweden, 2008). 37. T. Ahlström, Lund Archaeol. Rev. 3, 37 (1997). 38. T. Ahlström, Stockholm Archaeol. Rep. 33, 325 (1997). 39. T. Ahlström, Grave or Ossuary? Osteological Finds from a Recently Excavated Passage Tomb in Falbygden, no. 1201 in British Archaelogical Reports International Series (Archaeopress, Oxford, 2003). 40. T. Ahlström, Underjordiska dödsriken humanosteologiska studier av neolitiska kollektivgravar, no. 18 of Coast to Coast Books (Göteborgs Universitet, Lund, Sweden, 2009). 41. K.-G. Sjögren, Fragment av ordning. Undersökning av överplöjda megalitgravar vid Frälsegården, Gökhems socken, Västergötland, no. 23 in Västergötlands Museum Rapport (Göteborg University, Lund, Sweden, 2008). 42. G. Eriksson et al.,, J. Anthropol. Archaeol. 27, 520 (2008). 43. G. Eriksson, Journal of Anthropological Archaeology pp. 135-162 (2004). 44. K. Lidén, Prehistoric Diet Transitions—An Archaeological Perspective, no. 1 in Theses and Papers in Scientific Archaeology (Stockholm University, Hässelby, Sweden, 1995). 45. C. Lindqvist, G. Possnert, The Subsistence Economy and Diet at Jakobs/Ajvide, Eksta Parish and other Prehistoric Dwelling and Burial sites on Gotland in Long-term Perspective, no. 13:a in Theses and Papers in Scientific Archaeology (Stockholm University, Hässelby, Sweden, 1997). 46. K. Lidén, Laborativ Arkeologi 9, 5 (1995). 47. K. Lidén, G. Eriksson, B. Nordqvist, A. Götherström, E. Bendixen, Antiquity 79, 23 (2004). 48. J. Norderäng, 14C-dateringar från Ajvide, no. 3 in Gotland University Press Monograph (Gotland University, Visby, Sweden, 2008). 49. G. Janzon, Gotlands Mellanneolitiska Gravar, Studies in North-European Archaeology (Almqvist & Wiksell, Stockholm, Sweden, 1974). 50. P. Molnar, Tracing Prehistoric Activities. Life Ways, Habitual Behaviour and Health of HunterGatherers on Gotland, Theses and Papers in Osteoarchaeology no. 4. (Stockholm University, Stockholm, Sweden, 2002). 51. D. Y. Yang, B. Eng, J. S. Waye, J. C. Dudar, S. R. Saunders, Improved DNA extraction from ancient bones using silica-based spin columns. Am. J. Phys. Anthropol. 105, 539 (1998). doi:10.1002/(SICI)1096-8644(199804)105:4<539::AID-AJPA10>3.0.CO;2-1 Medline 52. H. Li et al.; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078 (2009). doi:10.1093/bioinformatics/btp352 Medline 53. R. E. Green et al., Analysis of one million base pairs of Neanderthal DNA. Nature 444, 330 (2006). doi:10.1038/nature05336 Medline 54. R. E. Green et al., A draft sequence of the Neandertal genome. Science 328, 710 (2010). doi:10.1126/science.1188021 Medline

55. Z. Zhang, S. Schwartz, L. Wagner, W. Miller, A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203 (2000). doi:10.1089/10665270050081478 Medline 56. M. L. Kampmann et al., A simple method for the parallel deep sequencing of full influenza A genomes. J. Virol. Methods 178, 243 (2011). doi:10.1016/j.jviromet.2011.09.001 Medline 57. H. N. Poinar et al., Metagenomics to paleogenomics: Large-scale sequencing of mammoth DNA. Science 311, 392 (2006). doi:10.1126/science.1123360 Medline 58. A. Kloss-Brandstätter et al., HaploGrep: A fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25 (2011). doi:10.1002/humu.21382 Medline 59. M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386 (2009). doi:10.1002/humu.20921 Medline 60. R. M. Andrews et al., Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999). doi:10.1038/13779 Medline 61. P. Scheet, M. Stephens, A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629 (2006). doi:10.1086/502802 Medline 62. J. K. Pickrell et al., Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826 (2009). doi:10.1101/gr.087577.108 Medline 63. J. Novembre et al., Genes mirror geography within Europe. Nature 456, 98 (2008). doi:10.1038/nature07331 Medline 64. S. Levy et al., The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007). doi:10.1371/journal.pbio.0050254 Medline 65. J. Wang et al., The diploid genome sequence of an Asian individual. Nature 456, 60 (2008). doi:10.1038/nature07484 Medline 66. S. M. Ahn et al., The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622 (2009). doi:10.1101/gr.092197.109 Medline 67. D. Pushkarev, N. F. Neff, S. R. Quake, Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847 (2009). doi:10.1038/nbt.1561 Medline 68. Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69 (2005). doi:10.1038/nature04072 Medline 69. S. C. Schuster et al., Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943 (2010). doi:10.1038/nature08795 Medline 70. C. Wang et al., Stat. Appl. Genet. Mol. Biol. 9, 13 (2010). 71. B. E. Engelhardt, M. Stephens, Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis. PLoS Genet. 6, e1001117 (2010). doi:10.1371/journal.pgen.1001117 Medline 72. A. Ginolhac, M. Rasmussen, M. Gilbert, E. Willerslev, L. Orlando, Bioinformatics 27, 2153 (2011). 73. H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589 (2010). doi:10.1093/bioinformatics/btp698 Medline

74. N. Rosenberg, distruct: A program for the graphical display of population structure. Mol. Ecol. Notes 4, 137 (2004). doi:10.1046/j.1471-8286.2003.00566.x 75. M. García-Garcerà et al., Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human palaeogenomics. PLoS ONE 6, e24161 (2011). doi:10.1371/journal.pone.0024161 Medline 76. M. Fogg, L. Pearl, B. Connolly, Structural basis for uracil recognition by archaeal family B DNA polymerases. Nat. Struct. Mol. Biol. 9, 922 (2002). doi:10.1038/nsb867 77. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792 (2004). doi:10.1093/nar/gkh340 Medline 78. Fields Development Team, Fields: Tools for Spatial Data (National Center for Atmospheric Research, Boulder, CO, 2006); www.cgd.ucar.edu/Software/Fields. 79. J. Wakeley, Coalescent Theory (Roberts & Company, Greenwood Village, CO, 2008). 80. I. Gronau, M. Hubisz, B. Gulko, C. Danko, A. Siepel, Nat. Genet. 43, 1031 (2011). 81. A. Keinan, J. C. Mullikin, N. Patterson, D. Reich, Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat. Genet. 39, 1251 (2007). doi:10.1038/ng2116 Medline 82. D. Reich et al., Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053 (2010). doi:10.1038/nature09710 Medline 83. C. J. Creevey, J. O. McInerney, Clann: Investigating phylogenetic information through supertree analyses. Bioinformatics 21, 390 (2005). doi:10.1093/bioinformatics/bti020 Medline 84. M. Jakobsson et al., Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998 (2008). doi:10.1038/nature06742 Medline 85. P. Moorjani et al., The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7, e1001373 (2011). doi:10.1371/journal.pgen.1001373 Medline

Supplement - Jakobsson Lab

Apr 27, 2012 - against a large database with genomic information for several ..... joining tree using an algorithm in the clann software package (83), and find that the ...... Mac. 67.62%. 68.58%. 22. 41. 8. Sweden. Swe. 67.61%. 68.84%. 15.

4MB Sizes 2 Downloads 292 Views

Recommend Documents

Supplement - Jakobsson Lab
Apr 27, 2012 - are affected by the collective burial custom. ...... joining tree using an algorithm in the clann software package (83), and find ...... London, 1925). ..... Fields Development Team, Fields: Tools for Spatial Data (National Center for 

Social Phishing - Markus Jakobsson
Dec 12, 2005 - a phisher were able to induce an interruption of service to a ... Table 1: Results of the social network phishing attack and control experiment.

Supplement
water soluble, rapidly absorbed from the GI tract, has high bioavailability,21,22 and reaches ...... Gage et al197 developed a dosing algorithm based on CYP2C9 ...

Supplement
Additional evidence supports improvements in health-care utilization and psychosocial outcomes. There are few additional ... top 10 in which mortality continues to increase.5– 8 In persons 55 to 74 .... pulmonary rehabilitation programs have been d

Supplement to - GitHub
Supplemental Table S6. .... 6 inclusion or exclusion of certain genetic variants in a pharmacogenetic test ..... http://aidsinfo.nih.gov/contentfiles/AdultandAdolescentGL.pdf. .... 2.0 are expected to exhibit higher CYP2D6 enzyme activity versus ...

Supplement
sive selecting out of more and more genetically fit persons of very old age lays the foundation for a ... Supplement. © 2003 American College of Physicians 445 ...

Online Supplement
Proof Suppose u ∈ S. Then, Φt(u) converges to the origin exponentially as t ... ∗ISE dept., KAIST, Daejeon, South Korea, Email: [email protected], Tel.

Supplement Sheet.pdf
10 hours ago - +17000840 +B/R Blackcap Empress 2212 # N Bar Emulation EXT. B/R Blackcap Empress 76. Page 1 of 1. Supplement Sheet.pdf. Supplement Sheet.pdf. Open. Extract. Open with. Sign In. Details. Comments. General Info. Type. Dimensions. Size. D

Supplement
Health Interview Survey, and other data now document declining disability trends beginning in 1982 and .... understanding of the basic science of aging, the rates of increases will continue to slow. Constant ..... Requests for Single Reprints: James

Supplement Sheet.pdf
PEN B16. 53, 55. PEN B18. 52, 56, 57. PEN B12. 62, 63. PEN B10. 60, 61. SUPPLEMENT INFORMATION. LOT 7: 38 cm SC. LOT 17: Updated EPD's. HB GM CED BW WW YW M ME. 109 52 6 -.07 73 121 19 -2. HPG CEM STY Marb. YG CW REA Fat. 11 5 10 .51 0.06 41 .17 0.0.

SJS Supplement Sheet.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. SJS Supplement ...

KEARNS SUPPLEMENT SHEET.pdf
77 294D KCC1 WALSTREET 294D MR SB WALLSTREET 409B PERCENTAGE. 78 922D KCC1 CORPORATE 922D ES CORPORATE BX80 PERCENTAGE.

kenya gazette supplement -
“employer” means a person, public body, firm, ..... management or the administration thereof; .... (a) holds a Masters degree in a relevant field of study as may be ...

Bodybuilding Supplement Guide
Yet the human race prevailed. The catch, unfortunately, is that those who have a considerable propensity to store fat survived. Thus, the 20th-century human is someone who has ... Based on that, a 200-pound man would require a mere ... high-carb food

Medicare Supplement Plans.pdf
medicare supplement plans 2017. best medicare supplement plans. medicare supplemental plans. Whoops! There was a problem loading this page. Retrying.

Frenzen Supplement Sheet.pdf
Lot 159: Heifer calf born March 24, 2018 by Frenzen Direct Force D26. Lot 160: Heifer calf born March 25, 2018 by Frenzen Denim D41. Lot 163: Heifer calf born March 26, 2018 by Frenzen Denim D41. Lot 167: Bull calf born March 13, 2018 by THM Excede Z

supplement to study material - ICSI
Ensure that advertisement giving details relating to oversubscription, basis ... Ensure that no advertisement or distribution material with respect to the issue.

Yrl Supplement use.pdf
5413 6216. Page 1 of 1. Yrl Supplement use.pdf. Yrl Supplement use.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Yrl Supplement use.pdf.

Medicare Supplement Insurance.pdf
almost always better to have more insurance rather than the find yourself underinsured. Affordable Health Insurance. Phone: 855-745-5422. Email: [email protected]. Website: https://keyhealthplans.com. Google Site: https://sites.googl

DNA Supplement Sheet.pdf
Page 1 of 2. Lot # Name CED% BW% WW% YW% DMI% YH% SC% DOC% HP% CEM% MILK% MW% MH% CW% MARB% RE% FAT% TEND%. 1 S S Niagara E34 8 10 5 3 69 64 60 2 55 83 11 25 46 26 26 19 50 54. 2 S S Niagara E83 6 7 25 31 90 99 81 71 3 27 32 29 77 66 33 50 80 7. 3 S

Reich Supplement Sheet.pdf
61 D211 2-Apr-16 EXT 1256 4.1 36.4 0.34 0.03 3.72 12.9 0.31. 62 D020 7-Mar-16 EM 1107 3.1 37.5 0.3 0.55 2.29 12.3 0.28. 64 D125 21-Mar-16 GC 1233 3.7 34.0 0.48 0.49 3.29 13.3 0.34. 65 D152 24-Mar-16 GC 1186 3.0 34.1 0.48 0.54 3.09 14.5 0.41. 65 D206

Medicare Supplement Plans.pdf
record of involvement in the senior insurance marketplace and/or a high financial. strength rating. 2. Medigap plans can be used at any doctor or hospital, nationwide, regardless of. which company sells you the plan. Many types of insurance are netwo

supplement to study material - ICSI
(ii) the issuer undertakes to provide market-making for at least two years from ..... buyers if an issuer has not satisfied the basic eligibility criteria and undertakes ...... buyers on proportionate basis as per illustration given in Part C of Sche