Background Currently, there are some known drawbacks in Greenplum DB: 1. The cluster has to restart while adding more segments because the hash algorithm mapping tuple to segment is not monotonic: we have to change tables’ distribution_policy and re-distribute all the data(will lock the table in Access Exclusive Mode) and then change tables’ distribution_policy back. Very huge cost. 2. Data is stored on each segment. Although we add support for replicated table, we still can make data distribution more flexible, like data is distributed on some user-specific segments. 3. Gang size is always 1 or equal to N(the number of segments). This may lead to two problems: a. We might spawn too many processes in the system which might cause workload problem b. We are not able to spawn more processes for computation to improve the performance

Goals We want to make data distribution more flexible in Greenplum DB, so that: 1. Elastic Expanding: Database is available while expanding GPDB cluster. This will benefit Greenplum on Cloud. 2. Elastic Storage: Data in a table does not need to be distributed to each segment 3. Dynamic Gang Size: a. High Concurrency: we do not need to spawn so many processes when data is only distributed on several segments and the system’s workload is hight b. High Performance: we can get computing performance gain by spawn more processes in a gang to do computation 4. Cloud support: it will become much easier and better to run Greenplum on Cloud Platform with above elastic capability

Important Features and the Implementation Region: New Data Distribution We introduce a concept “region” to make data distributed more flexible and show that we could achieve the above goals by implementing region correctly and efficiently.

A region is a set of segments. Users can `create/drop` regions to manage it. Users can create table and specify a region the table binds, so that data inserted into the table will only be stored in the segments within the region.

Manage Region DDL -- this statement create a region whose name is r1, with 4 segments: seg1, seg2, seg3, seg9. create​ region r1 ​with​ segments=​'1-3,9'​; -- this statement drop a region whose name is r1. -- If there are any tables binded to r1 region, this statement will fail. drop​ region r1; There is a problem of the design of region: should region be cluster-level or database level? Currently, we prefer to implement region in cluster-level. And we do not allow `alter region` statement because this statement is so heavy that we have to change each table that binded to the region. This statement might hangs for a long time and what if it crashes just before finishing?

Bind Table to A Region -- This statement create table and binds it to region r1(1-3,9). -- So that data of t1 is distributed on r1's segment by hashing column c. create​ ​table​ t1 (c ​int​) ​with​ region=r1 ​distributed​ ​by​ (c); -- Insert data into t1, the four tuples (1),(2),(3),(4) will be sent to -- seg1, seg2, seg3, seg9, and hash-distributed on these 4 segments. insert​ ​into​ t1 ​values​ (​1​), (​2​), (​3​), (​4​); -- This statement changes t1's binding region to r2(suppose r2's segments are seg1,seg2,seg,3,seg9, seg10, we add a segment based on r1). -- This will cause data movement. T1's data will re-distributed on the new region. -- We should implement the process of movement very efficiently and table t1 is always available (at least for read) during the process. -- We will discuss an efficient implementation later. alter​ ​table​ t1 ​set​ region ​to​ r2;

-- DBA -- DBA select​ select​

should easily know a table's binding region. should easily know what tables are in a region. region_oid ​from​ gp_distribution_policy ​where​ rel_oid=table_t_oid; rel_oid ​from​ gp_distribution_policy ​where​ region_oid=r_oid

Internal Design and Implementation The statement that causes data movement is more complex, so let’s discuss more on that. And if t1 is distribute randomly, it is quite easy. We focus on hash distributed table’s data movement. alter​ ​table​ t1 ​set​ region ​to​ r2; -- At the beginning, t1 is on r1 1. r2 is proper super set of r1. For example, r1 is (seg0, seg1, seg2), and r2 is (seg0, seg1, seg2, seg3). In this case, we should make sure data movement only happen from seg0-2 to seg3, that is our hash-distribution algorithm is monotonic. And we should try our best to reduce the data that needs to move. We could implement the process as the following plan: Insert​ o ​ n​ t (slice2 ​on​ r2) -> Redistribute Motion r1 to r2 ​Hash​ ​Key​: public.t.c1 -> ​Delete​ ​on​ t (slice1 ​on​ r1) -> Seq ​Scan​ ​on​ t (slice1 ​on​ r1)

The key point that needs to pay attention to is the insert node and delete node is executed on different segments. 2. r2 is proper subset of r1. We can still do it as the above plan if we make delete node and insert node happen on different segments. 3. We can still do it as the above plan if we make delete node and insert node happen on different segments. And we should make the process in a transaction. We should lock the table in Exclusive Mode at the beginning of the transaction. Other transactions will not see region changes before we commit due to MVCC. Let’s see an example of query between two regions.

create​ region r1 ​with​ segments=​'6,7,8'​; create​ region r2 ​with​ segments=​'8,9,10,11'​; create​ ​table​ t1 (c ​int​) ​distributed​ ​by​ (c) ​on​ region r1; create​ ​table​ t2 (c ​int​) ​distributed​ ​by​ (c) ​on​ region r2; -- then insert some data into t1 and t2 -- ######################################################## select​ * ​from​ t1; -- this should create a gang with 3 processes on seg6, seg7, seg8 select​ * ​from​ t2; -- this should create a gang with 4 processes on seg8, seg9, seg10, seg11 select​ * ​from​ t1 ​join​ t2 ​on​ t1.c = t2.c; -- attention! In current gpdb, there's no motion here. However, with region now, t1 -- and t2 are on different regions, we have to motion data. -- this will create two gangs: -- * gang1: table scan on t1, and a redistributed motion to gang2 -- * gang2: table scan on t2, get data from gang1's via redistributed motion, then do the join By studying this example, we have some conclusions: 1. the processes in a gang is related to the region 2. motion can transfer data between regions, that is the send and receive can be on different regions

Expanding Cluster Online With region we can expand greenplum cluster online. The working flow is like: 1. get the new segments ready(binary, directories, ssh-keys...) 2. add them into the cluster a. initdb on the new segments b. make QD connect to the segments and keep their states c. update QD's segments config 3. create a new whole region which contains all the segments 4. Alter some tables’ region to move data now or later (The database is available even if we skip this step)

Underlying Storage Management However, current hash algorithm is very simple: hash tuple to a value, then do module. The biggest drawback of this is that after adding one segment into the cluster, we have re-arrange all the data among all the segments. (If we double the segments, we can only move part of data in old segments to the new one, single direction flow of data. The cost is ​double​). So to solve this problem we have to design a new hash algorithm. We could use consistent hash with virtual nodes or we can introduce bucket concept. Comparing to consistent hash, bucket can make data balance distribution easier, and the data distribution logic is clearer so that we might use it to optimize query in the future. Let's talk about bucket method here. Each tuple firstly is mapped to a hash value just using current method. Then we can use another simple map to map this hash value to bucket id. And for each table, we maintain its bucket_id to segment id map. With the bucket layer, when we add one segment into the cluster, since we have not updated the `bucket to segment` mapping, the db can work correctly. Further, with the bucket layer, we do not need to re-arrange all the data. We could just move some buckets from several segments to the new one. This leads to much less data transformation. There is a problem of implementation: how to find out all the tuples in a bucket efficiently?

Catalog changes: 1. Add a catalog pg_region to maintain region information 2. Add some columns to gp_distribution_policy to maintain table-region relation information and bucket_segment mapping information

Challenges Some foreseeable challenges for us: 1. The gang management redesign including gang creation and gang reuse to support vary-size gang. 2. Involve segments as queries needed: eg, operation like INSERT/UPDATE on a table only involve partial segments in the region and two phase commit commands are not dispatched to all segments. 3. What does “begin” do after introducing dynamic gang size 4. Generate correct parallelled plan 5. How to maintain primary writer gang? 6. Meta data copy(catalog…) during online expanding 7. ORCA support

8. Modify gpdb tools if needs: gpexpand|pgdump… 9. Enhance upcoming upgrader tool to support downward-compatible so that existing cluster/database can continue to work without major changes.

Use Case With region, there could be many good use cases, e.g. 1. Very large cluster Region will remove the overhead when more and more segments are added to a large cluster so that users can continue to add more nodes/segments to setup very large cluster and support their increasing data size and workload 2. TP+AP Hybrid scenario Users can use one segment to host some TP tables while the rest of segments can serve AP tables. Given this, we might optimize TP workload very efficiently and users can still real-time access both TP and AP table in one big database. 3. Multi-tenant scenario For multiple tenants, DBA can create separate regions for different tenants so that some specific small tenants are only allowed to store data on some segments/nodes, while some big tenants can store data on entire cluster. This can create a good vertical resource isolation so that in most cases the tenants only run their workload on limited segments/nodes. 4. Enhanced High availability We could enhance cluster HA to new level. For now, when both primary and mirror nodes are down, the cluster will be out of service. With region, We might continue to serve DML statements only if the related table regions are available.

Background Goals Important Features and the Implementation - Groups

Platform with above elastic capability. Important Features and .... bucket can make data balance distribution easier, and the data distribution logic is clearer so.

108KB Sizes 1 Downloads 29 Views

Recommend Documents

Background -
every 10 women, it is still remarkable that ... is taken in providing the latest and best .... phlets, our website and social media. 4. .... awareness campaign that.

Position Background -
3-6 years of experience in Enterprise portal. •. Graduate in any discipline. •. Should have handled End-To-End SAP Portal implementations. •. Should have experience on SAP Portal development - Net Weaver 7.0 or higher. •. Working experience i

important notes -
The Company should have total cash funds plus credit lines (whether own ... one of the Top 25 Western European Banks &/or Top 4 USA Banks and should be ...

background the denhart family sustainability ... -
created the Denhart Family Sustainability Scholarship Prize to: ... 2. Assist the College in becoming a leader in sustainability and comprehensive .... which may include exhibition in the Student Gallery, on the Art Center website, in publications,.

Introduction Features and Benefits -
Feb 22, 2016 - BEN 143846. From: Evangeline (Seo) Tay, AT&T Sales/Marketing Program Support. 2000 W AT&T Center Drive,. Hoffman Estates, IL 60192.

Introduction Features and Benefits -
Feb 22, 2016 - applications, custom calling features, eligible users, and eligible locations. ... Voice mail; Caller ID, 3-Way. Conference, Detail Billing, Call.

1 Background 2 Goals 3 Approach 4 Challenges - GitHub
code for the feed forward network required major modifications at every level in order to accomadate. The existing application of NEAT to Flappy Bird was not ...

Achieving Good Goals -
Aug 3, 2013 - free to add to the first two lists you have made and iterate until you are sort of happy with it. Don't try and get it perfect. ... You can always add more when you have achieved these, but by then your goals will have probably changed

iAgro Geotag Features -
available on Mobile App and Web Portal. ✓ Geo-Tagging of Land parcels of Commercial Crops. ✓ Periodical field data capturing along with Images & Videos.

iAgro Geotag Features -
iAgro Geotag is a Complete Crop Management. Solution for all ... Periodical field data capturing along with Images & Videos ... Field Staff Movement Tracking.

Mica feeder Implementation and Maintenance ... -
Automation Department. Version:1. Date:2017.7 ... 4. Maintenance. 5. Electrical Control System ... Peeling system by worm gear. • High accuracy and reliability.

Research Integration and Implementation -
Feb 2, 2014 - Education, HRD, Agriculture Health, Science & Technology, Energy ... Programs like International Year for the Culture of Peace, Million Minutes of Peace ... Online Registration Open : http://sparcconference.eventbrite.com.

Mica feeder Implementation and Maintenance ... -
6. Feeder mechanical BOM. Item. Part number. Description. Quantity. Actual picture. 1. 01. Feeder main. 1. 2. 02. Micro DC reducer motor. (1:20). 2. 3. 04. Micro DC reducer motor bracket. 2. 4. 05. Micro switch. 2. 5. 09. Worm gear. 1. 6. 10. Big gea

Implementation of Power Aware Features in AODV for ...
Ad hoc wireless networks are power constrained since nodes ... proposed three energy efficient routing protocols at [4] and from .... service in the entire network.

Implementation of Power Aware Features in AODV for ...
and 0.95% for heavy data like 8 flows. As we can observe the improvement of energy consumption is greater for high data simulations and this is important.

Features of Campus Commune -
Campus Commune is professional network by TATA Consultancy Services (TCS) that provides a platform to academia and TCS to connect, communicate and ...

IMPORTANT INSTRUCTIONS 1. Registration ... -
Registered list can be viewed through the URL http://grad2015.kongu.edu/viewlist.php. 2. No Spot Registration. 3. All the ... Formal dress code should be strictly followed (Jeans and T- Shirts are not allowed). The Graduation day gown will be ...

IMPLEMENTATION OF MIS Implementation of MIS ... -
space occupied by computers, terminals, printers, etc., as also by people and their movement. ... These classes are not necessarily exclusive, as they quite often.

Background The challenge
courses for bachelors and masters degrees in business management, law, behavioral sciences ... economics, accounting and more… Academic ... We used an A/B testing system so as to conduct an accurate statistical study. ... We can examine the entire

Background The challenge
The company owns 5 brands that sell insurance packages to hundreds of ... In most of the tests we use Google Analytics experiments code to set up and ...

The OpenNARS implementation of the Non ... -
forward and backward rules for valid reasoning under uncertainty. Meta Rule DSL The meta Rule DSL was developed to serve three main pur- poses: to provide ...

The OpenNARS implementation of the Non ... -
moment, called priority, d encodes how fast priority should decay, and q (quality) ..... Forgy, Charles L. Rete: A Fast Algorithm for the Many Pattern/Many Object.