Streamulus A language for real-time event stream processing Irit Katriel MADALGO Seminar Århus, 14 June 2012

Event Stream An infinite, ordered sequence of discrete elements

Event Stream Processing A stream arrives as a sequence of calls to a HandleEvent function

HandleEvent(

)

HandleEvent(

)

HandleEvent(

)

We need to reason about the forest

We focus on how to write programs (not on algorithms)

Motivating Example: Crossings of Moving Averages Death Cross

Golden Cross

Cross Detection Slow Decaying Moving Average Time Series

Compare (slow
Unique (remove repetitions)

Fast Decaying Moving Average

Let's implement this object-orientedly...

Alert

Moving Average template class Mavg { ... double Tick(value) { double alpha = 1-exp(-DecayFactor*(now-prev_time)); prev_time = now; return mavg = alpha*value + (1-alpha)*mavg; } double Get() { return mavg; }

};

double mavg; clock_t prev_time;

Cross Detection Class class CrossDetection { …. void Tick(value) { bool comp = (slow.Tick(value) < fast.Tick(value)); if (comp != prev_comp) IssueCrossingAlert(comp); prev_comp = comp; } Mavg<1> slow; Mavg<10> fast; bool prev_comp; };

Cross Detection Class class CrossDetection { …. void Tick(value) { bool comp = (slow.Tick(value) < fast.Tick(value)); if (comp != prev_comp) IssueCrossingAlert(comp); prev_comp = comp; } Mavg<1> slow; Mavg<10> fast; bool prev_comp; };

What if the moving averages are also needed elsewhere?

Refactored Cross Detection Class class CrossDetection { CrossDetection(Mavg<1>& slow_, Mavg<10>& fast_) : slow(slow_), fast(fast_) { }

Construct mavgs elsewhere and pass in references.

void UpdateValue() { bool comp = (slow.Get() < fast.Get()); if (comp != prev_comp) IssueCrossingAlert(comp); prev_comp = comp; Update mavgs } elsewhere. Here

};

Mavg<1>& slow; Mavg<10>& fast; bool prev_comp;

only probe.

Using the Refactored Class Mavg<10> fast_mavg; Mavg<1> slow_mavg; CrossDetection cross_detection(slow_mavg, fast_mavg); SomethingElse something_else(slow_mavg, fast_mavg);

setu

p

Using the Refactored Class Mavg<10> fast_mavg; setu Mavg<1> slow_mavg; p CrossDetection cross_detection(slow_mavg, fast_mavg); SomethingElse something_else(slow_mavg, fast_mavg);

HandleEvent(double value) { slow_mavg.Tick(value); fast_mavg.Tick(value); cross_detection.UpdateValue(); // implicit data something_else.UpdateValue(); // dependencies }

proc an e ess vent

This was noticed before From “The 8 requirements of real-time stream processing”, Stonebraker, Çetintemel, Zdonik. SIGMOD Record, 2005: "Historically, for streaming applications, general purpose languages such as C++ or Java have been used as the workhorse development and programming tools. Unfortunately, relying on low-level programming schemes results in long development cycles and high maintenance costs." And they conclude with the requirement: "Query using StreamSQL"

This was noticed before From “The 8 requirements of real-time stream processing”, Stonebraker, Çetintemel, Zdonik. SIGMOD Record, 2005: "Historically, for streaming applications, general purpose languages such as C++ or Java have been used as the workhorse development and programming tools. Unfortunately, relying on low-level programming schemes results in long development cycles and high maintenance costs." And they conclude with the requirement, where they probably meant: "Query using StreamSQL"

A Domain-Specific Langauge

StreamSQL

Sliding w in Last 20 e dow. ntries. SELECT avg(some_column) as AvgValue

FROM input [rows 20] WHERE some_condition GROUP BY another_column ●

StreamBase



Esper



Sybase Aleri



Microsoft StreamInsight



...

Can include user-defined operators

Returning to Our Problem Mavg<10> fast_mavg; setu Mavg<1> slow_mavg; p CrossDetection cross_detection(slow_mavg, fast_mavg); SomethingElse something_else(slow_mavg, fast_mavg);

HandleEvent(double value) { slow_mavg.Tick(value); fast_mavg.Tick(value); cross_detection.UpdateValue(); // implicit data something_else.UpdateValue(); // dependencies }

proc an e ess vent

The Streamulus Way InputStreamT ts = NewInputStream(“TS”); SubscriptionT slow = Subscribe(Mavg<1>(ts)); SubscriptionT fast = Subscribe(Mavg<10>(ts)); Subscribe( cross_alert( unique( slow < fast ) ) ); Subscribe( something_else(slow,fast) );

HandleEvent(double value) { InputStreamPut(ts, value); }

proc an e ess vent

setu

p

Setup Constructs the Graph Subscribe( cross_alert( unique( slow_mavg(ts) < fast_mavg(ts) ) ) );

Slow Decaying Moving Average Time Series

Compare (slow
Unique (remove repetitions)

Alert

Inputs Propagate Automatically Through the Graph HandleEvent(double value) { InputStreamPut(ts, value); }

Slow Decaying Moving Average Time Series

Compare (slow
Unique (remove repetitions)

Alert

User-Defined Functions What are Mavg, unique and cross_alert? ●

Write a functor F that handles a single event



Streamify it.

cross_alert is Streamify struct cross { template struct result { typedef bool type; };

Boost result_of protocol (not needed in C++11)

bool operator()(bool golden) { std::cout << (golden ? “Golden” : “Death”); std::cout << “ Cross” << std::endl; return golden; } };

Process event

unique is Streamify struct unique_func { unique() : mFirst(true) {} template struct result { typedef bool type; };

Boost result_of protocol (not needed in C++11)

bool Filter(bool value) const { return mFirst || (value != mPrev); } bool operator()(bool value) { mFirst = false; return mPrev = value; } private: bool mFirst; bool mPrev; };

Will there be an output? (optional) Value of the next output

How does it work? There are two things to talk about: ●

The graph data structure –



How the data propagates through it

The Subscribe() function –

How it turns expressions into a graph

The Streamulus Engine ●



Maintains the Graph –

Nodes have operators



Edges have buffers

Propagates inputs by activating nodes in a safe order

What is a safe order? 1

1

0

Graph for (X+1)/(X+2) +

0

1 /

X

0 2

+

2

2

1/2

What is a safe order? 1

1

1

Graph for (X+1)/(X+2) +

0

1 /

X

0 2

+

2

2

1/2

What is a safe order? 1

1

1

Graph for (X+1)/(X+2) +

1

2 /

X

0 2

+

2

2

1

What is a safe order? 1

1

1

+

1

2 /

X

1

+

2/3

3

2

2 both

Graph for (X+1)/(X+2)

+

and

+

should be activated before

in other words: topological order

/

The Streamulus Data Structure X

* +

Y

Graph for X*(X+2Y)

x2

Priority Queue of Active Nodes

*

X

+

Priority = { TimeStamp, Index } TimeStamp of oldest incoming data, Index of the node in topological order

What's in a Node? class Strop // for STReam Operator { …

virtual bool Work()=0; // return true if emitted output } Also has context data members: ●

Pointer to the engine



Identifier of its node in the graph



It's topological order index

Streamify We had: unique is Streamify

Streamify takes a single-event functor (or object) and creates a strop that does the right thing. You can create your own strops directly –

but for most purposes Streamify should suffice.

InputStream ●

A special kind of Strop.



Has a Tick(value) function –

Called from outside of Streamulus



Causes the node to emit value to its output

InputStream::type ts=NewInputStream(“TS”); InputStreamPut(ts, value); Calls ts's Tick function

How Data Propagates ●

Engine's Main Loop (single threaded): While ActiveNodes is not empty: v = ActiveNodes.Pop() v.Work()



When the queue is empty, the engine idles –



er t s n i t Migh to nodes Nodes e Activ

No busy waiting

When an input Tick()s, the engine is activated –

Resumes its main loop

Subscribe() Y g

“f(x,g(y))”

f X

Easy. Anyone with a parser and a stack can do it. If all the edges carry the same type.

Subscribe() Y g

“f(x,g(y))”

f X

But what is the type of g(y) ●

When y is int?



When y is a user-defined type?

Two Options ●



Avoid the problem –

Generic container (union, variant): waste space



Serialisation: waste time



void pointers: unsafe

Solve the problem –

Compute the data type of each edge



Allocate a buffer for that type



How? C++ Template Metaprogramming

Metaprogramming Writing code that generates or manipulates code



Compilers



Source code generators



Self-modifying programs

C++ Templates Designed for Generic Programming template T max(T a, T b) { return a > b ? a : b; } Paradigm

Type resolution

Generic Programming

During compilation

Polymorphism

At runtime

C++ Template Metaprogramming “Programming with types” Metafunctions map types to types: template struct VectorOfPairs { typedef std::vector> type; };

typedef is an assignment: typedef VectorOfPairs::type my_vector;

C++ Template Metaprogramming Recursive metafunctions make the compiler compute stuff: template struct Factorial { static const int value = i * Factorial::value; }; struct Factorial<1> { // base case static const int value = 1; }; int five_factorial = Factorial<5>::type;

C++ Template Metaprogramming Control flow via template specialization: template struct IntToDouble { typedef T type; }; struct IntToDouble { typedef double type; }; IntToDouble::type IntToDouble::type

// == double // == char (unchanged)

C++ Template Metaprogramming Compile-Time Data Structures Linked List of Types struct End {}; template struct Node { typedef T type; typedef NEXT next; }; typedef Node>> List;

C++ Template Metaprogramming Insert types to a list: template struct Push { typedef Node type; }; typedef Push::type L1; typedef Push::type L2; typedef Push::type L3;

C++ Template Metaprogramming Insert types to a list: template struct Push { typedef Node type; }; typedef Push::type L1; typedef Push::type L2; typedef Push::type L3;

C++ Template Metaprogramming Compute the length of a list: template struct Size; template struct Size > { static const int value = 1 + Size::value; }; Template <> struct Size { static const int value = 0; };

Useful Boost Libraries MPL (Aleksey Gurtovoy and David Abrahams) –

Fusion –

Utilities, Data Structures, Sequences, Iterators (Joel de Guzman, Dan Marsden, Tobias Schwinger)

Heterogenous containers fusion::vector my_vector;

Proto (Eric Niebler + Joel Falcou, Christophe Henry) –

A framework for building Domain-Specific Embedded Languages in C++

Using Proto ●

Define a grammar –



Define transformations –



Which expression are valid?

What should become of each sub-expression?

Activate the grammar on an expression

Operator Overloading in C++ class MyType { … }; class YourType { … }; class OurType { … }; OurType operator+(MyType mine, YourType yours) { return …. ; // Compute an OurType from the inputs } MyType mine; YourType yours; OurType ours = mine + yours;

Expression → Tree Proto defines a static expression type proto::expr and overloads all operators for it.

For example: expr1 - expr2 returns something like proto::expr,2>

Expression Tree for -(2+3) proto::expr tag::negate

tag::plus

tag::terminal

2

tag::terminal

3

The proto::expr Type template struct expr;

// what this node does // who it does it to

template< typename Tag, typename Args > struct expr< Tag, Args, 1 > { // unary expression typedef typename Args::child0 proto_child0; proto_child0 child0; // … }; …

// specialisations for other arities

Creating Proto Expressions ●

Define proto terminals proto::terminal::type x = {12};



x is a proto expression → So is any expression involving x ~((x+12)/x & 0xff)

Function call expressions proto::expr

First arg is a proto::terminal::type –



Identifies the function

Then the function's arguments –

Arbitrary proto::expr's

A Proto Grammar Recursive definition of valid expressions struct arithmetic : proto::or_< proto::plus , proto::minus , proto::multiplies , proto::divides , proto::terminal > {};

// anything

A Grammar With Transforms struct arithmetic : proto::or_< proto::when< proto::plus, Plus(arithmetic(proto::_left), arithmetic(proto::_right)> , proto::when< proto::minus, Minus(arithmetic(proto::_left), arithmetic(proto::_right)> … , proto::when< proto::terminal proto::_value> > {};

,

A Transform A functor that publishes its return type as result_type struct Plus : proto::callable { typedef int result_type; int operator()(int left, int right) { return left+right; } };

Invoking a Grammar proto::terminal x = {12}; int result = arithmetic() ( -(x+1) );

Apply the transforms bottom-up

tag::negate tag::plus tag::terminal

12

tag::terminal

1

Invoking a Grammar proto::terminal x = {12}; int result = arithmetic() ( -(x+1) );

tag::negate tag::plus tag::terminal

_value=12

12

_value=1

tag::terminal

1

Invoking a Grammar proto::terminal x = {12}; int result = arithmetic() ( -(x+1) );

tag::negate tag::plus tag::terminal

Plus(12,1)=13 _value=12

12

_value=1

tag::terminal

1

Invoking a Grammar proto::terminal x = {12}; int result = arithmetic() ( -(x+1) );

tag::negate tag::plus tag::terminal

This example is not very useful. It's what c++ already does.

Plus(12,1)=13 _value=12

12

_value=1

tag::terminal

1

The Streamulus Grammar ●



Identifies all operators, as well as user-defined functions. Each transform –

Creates a strop for the node's operator/func



Inserts it to the graph



Connects it to child-nodes' strops



Which were created recursively Returns a pointer to the new strop ●

Subscribe() InputStream::type x = NewInputStream(“X”); Subscribe ( -(x+1) );

tag::negate tag::plus “X”

tag::terminal>

tag::terminal

1

Subscribe() InputStream::type x = NewInputStream(“X”); Subscribe ( -(x+1) );

tag::negate tag::plus

InputStreamStrop(x)

ConstStrop(1)

Subscribe() InputStream::type x = NewInputStream(“X”); Subscribe ( -(x+1) );

tag::negate PlusStrop

InputStreamStrop(x)

ConstStrop(1)

Subscribe() InputStream::type x = NewInputStream(“X”); Subscribe ( -(x+1) ); NegateStrop

PlusStrop

InputStreamStrop(x)

ConstStrop(1)

Subscribe() Finally: Compute topological order Link the nodes to the engine TO=4 NegateStrop Streamulus Engine

TO=3 PlusStrop

TO=1 InputStreamStrop(x)

TO=2 ConstStrop(1)

Status ●

First Release - soon



User Manual – eventually Nagging will help There's a lot to do –



Improve it (e.g., multi-core version) – Apply it It's open-source, join in. –



Links



www.streamulus.com –



Link to github from there

Follow @streamulus on twitter –

Infrequent notifications (releases, news, etc)

Streamulus - GitHub

results in long development cycles and high maintenance costs." From “The 8 requirements of real-time stream ..... A framework for building Domain-Specific.

460KB Sizes 4 Downloads 348 Views

Recommend Documents

GitHub
domain = meq.domain(10,20,0,10); cells = meq.cells(domain,num_freq=200, num_time=100); ...... This is now contaminator-free. – Observe the ghosts. Optional ...

GitHub
data can only be “corrected” for a single point on the sky. ... sufficient to predict it at the phase center (shifting ... errors (well this is actually good news, isn't it?)

Torsten - GitHub
Metrum Research Group has developed a prototype Pharmacokinetic/Pharmacodynamic (PKPD) model library for use in Stan 2.12. ... Torsten uses a development version of Stan, that follows the 2.12 release, in order to implement the matrix exponential fun

Untitled - GitHub
The next section reviews some approaches adopted for this problem, in astronomy and in computer vision gener- ... cussed below), we would question the sensitivity of a. Delaunay triangulation alone for capturing the .... computation to be improved fr

ECf000172411 - GitHub
Robert. Spec Sr Trading Supt. ENA West Power Fundamental Analysis. Timothy A Heizenrader. 1400 Smith St, Houston, Tx. Yes. Yes. Arnold. John. VP Trading.

Untitled - GitHub
Iwip a man in the middle implementation. TOR. Andrea Marcelli prof. Fulvio Risso. 1859. Page 3. from packets. PEX. CethernetDipo topo data. Private. Execution. Environment to the awareness of a connection. FROG develpment. Cethernet DipD tcpD data. P

BOOM - GitHub
Dec 4, 2016 - 3.2.3 Managing the Global History Register . ..... Put another way, instructions don't need to spend N cycles moving their way through the fetch ...

Supervisor - GitHub
When given an integer, the supervisor terminates the child process using. Process.exit(child, :shutdown) and waits for an exist signal within the time.

robtarr - GitHub
http://globalmoxie.com/blog/making-of-people-mobile.shtml. Saturday, October ... http://24ways.org/2011/conditional-loading-for-responsive-designs. Saturday ...

MY9221 - GitHub
The MY9221, 12-channels (R/G/B x 4) c o n s t a n t current APDM (Adaptive Pulse Density. Modulation) LED driver, operates over a 3V ~ 5.5V input voltage ...

fpYlll - GitHub
Jul 6, 2017 - fpylll is a Python (2 and 3) library for performing lattice reduction on ... expressiveness and ease-of-use beat raw performance.1. 1Okay, to ... py.test for testing Python. .... GSO complete API for plain Gram-Schmidt objects, all.

article - GitHub
2 Universidad Nacional de Tres de Febrero, Caseros, Argentina. ..... www-nlpir.nist.gov/projects/duc/guidelines/2002.html. 6. .... http://singhal.info/ieee2001.pdf.

PyBioMed - GitHub
calculate ten types of molecular descriptors to represent small molecules, including constitutional descriptors ... charge descriptors, molecular properties, kappa shape indices, MOE-type descriptors, and molecular ... The molecular weight (MW) is th

MOC3063 - GitHub
IF lies between max IFT (15mA for MOC3061M, 10mA for MOC3062M ..... Dual Cool™ ... Fairchild's Anti-Counterfeiting Policy is also stated on ourexternal website, ... Datasheet contains the design specifications for product development.

MLX90615 - GitHub
Nov 8, 2013 - of 0.02°C or via a 10-bit PWM (Pulse Width Modulated) signal from the device. ...... The chip supports a 2 wires serial protocol, build with pins SDA and SCL. ...... measure the temperature profile of the top of the can and keep the pe

Covarep - GitHub
Apr 23, 2014 - Gilles Degottex1, John Kane2, Thomas Drugman3, Tuomo Raitio4, Stefan .... Compile the Covarep.pdf document if Covarep.tex changed.

SeparableFilter11 - GitHub
1. SeparableFilter11. AMD Developer Relations. Overview ... Load the center sample(s) int2 i2KernelCenter ... Macro defines what happens at the kernel center.

Programming - GitHub
Jan 16, 2018 - The second you can only catch by thorough testing (see the HW). 5. Don't use magic numbers. 6. Use meaningful names. Don't do this: data("ChickWeight") out = lm(weight~Time+Chick+Diet, data=ChickWeight). 7. Comment things that aren't c

SoCsploitation - GitHub
Page 2 ... ( everything – {laptops, servers, etc.} ) • Cheap and low power! WTF is a SoC ... %20Advice_for_Shellcode_on_Embedded_Syst ems.pdf. Tell me more! ... didn't destroy one to have pretty pictures… Teridian ..... [email protected].

Datasheet - GitHub
Dec 18, 2014 - Compliant with Android K and L ..... 9.49 SENSORHUB10_REG (37h) . .... DocID026899 Rev 7. 10. Embedded functions register mapping .

Action - GitHub
Task Scheduling for Mobile Robots Using Interval Algebra. Mudrová and Hawes. .... W1. W2. W3. 0.9 action goto W2 from W1. 0.1. Why use an MDP? cost = 54 ...