Is it legal to use the APIs of Meetup.com, Eventbrite etc, to aggregate their event data into one place, similar to Google?

I have a predicament. I’m trying to build a platform that will basically be a place where all events, online/offline, are listed.

Initially – my platform will obviously have zero events, because nobody knows about it. And for anyone to use it – there needs to be events on there – classic chicken and egg scenario.

My thought was to aggregate events on to my platform (and link to their source), whilst also allowing users to create events on MY platform. Eventually, I’d be able to not bother listing events from other sources if mine gains enough traction. So, I am definitely competing with these sites.

Google is pulling data from many different events sources, and displaying it on their pages like so – https://www.google.com/search?q=events+in+london+meetup&rlz=1C5CHFA_enUA720GB740&oq=events+&aqs=chrome.0.69i59l3j69i60l3j69i65l2.4289j1j9&sourceid=chrome&ie=UTF-8&ibp=htl;events&rciv=evn&sa=X&ved=2ahUKEwiYtuPx2bLuAhU3REEAHaRkCZ0Q5rwDKAJ6BAgNEA4&sxsrf=ALeKk03Uj9jeI8lTj0__V-UWcgcv_pdFew:1611427139353#htivrt=events&htidocid=L2F1dGhvcml0eS9ob3Jpem9uL2NsdXN0ZXJlZF9ldmVudC8yMDIxLTAyLTAzfDE2ODA2NzEwNzgyNjAxNDg4MTQ2&fpstate=tldetail

The difference is, Google, I assume (?), is probably scraping the data rather than using APIs.

In the Meetup API Terms, it states:-

"Not use the Meetup API for any commercial purpose without the express written consent of Meetup;

Not undermine our commercial interests or make unreasonable commercial uses of the Meetup API, such as by substantially replicating our Platform or significant aspects of the Platform, to be determined in Meetup’s sole discretion;

While you may charge for any application you develop (subject to Meetup’s consent), you may not sell, lease, or sublicense the Meetup API;"

I know aggregating is a bit of a legal grey area but I wanted to ask for opinions on whether or not this would be legal, and if not – how the hell do I get any traction and users without having any content?

Note: This is in the UK.

How to count in relational algebra without aggregate functions?


Find the user who has liked the most posts. Report the user’s id, name and email, and the id of the posts they have liked. If there is a tie, report them all.

The schema:

User(uid, name, website, about, email, phone, photo)  Likes(liker, pid, when)  Likes[liker]  is contained in  User[uid] 

I think I need a "function" NumberOfLikes or similar where I can do something like:

enter image description here

We aren’t allowed to use aggregate functions in this exercise. I assume the way to count in RA is by performing some sort of cross product black magic, but I don’t know how.

Help?

btc.ovh – Cryptocurrencies – an aggregate of content.

Why are you selling this site?
Because need money.
If you sell today the price may be slightly reduced.
We can negotiate.

How is it monetized?
Affiliate + adsense/banners
(The large blanks on the page are space for adsense. The website is waiting for activation.)

Does this site come with any social media accounts?
No.
—-
100% automatic + twitter autopost bot – easy get traffic from twitter.
Site using free APIs.
Easy to config via backend.

Twitter + FB login

Many…

btc.ovh – Cryptocurrencies – an aggregate of content.

Aggregate Multiple Instances of Each Row Without Multiple Seq Scans

I am trying to perform some mathematical operations in PostgreSQL that involve calculating multiple values from each row, then aggregating, without requiring multiple Seq Scans over the whole table. Performance is critical for my application, so I want this to run as efficiently as possible on large data sets. Are there any optimizations I can do to cause PostgreSQL to only use a single Seq Scan?

Here’s a simplified example:

Given this test data set:

postgres=> CREATE TABLE values (value int); postgres=> INSERT INTO values (value) SELECT * from generate_series(-500000,500000); postgres=> SELECT * FROM values;   value ---------  -500000  -499999  -499998  -499997  -499996 ...  499996  499997  499998  499999  500000 

And I want to perform this query that counts 2 instances of each row, once by the value column and once by the abs(value). I’m currently accomplishing this with CROSS JOIN:

SELECT   CASE idx   WHEN 0 THEN value   WHEN 1 THEN abs(value)   END,   COUNT(value) FROM values CROSS JOIN LATERAL unnest(ARRAY[0,1]) idx GROUP BY 1; 

Here’s the EXPLAIN ANALYZE result for this query. Notice the loops=2 in the Seq Scan line:

postgres=> EXPLAIN ANALYZE SELECT ....                                                           QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=82194.40..82201.40 rows=400 width=12) (actual time=997.448..1214.576 rows=1000001 loops=1)    Group Key: CASE idx.idx WHEN 0 THEN "values".value WHEN 1 THEN abs("values".value) ELSE NULL::integer END    ->  Nested Loop  (cost=0.00..70910.65 rows=2256750 width=8) (actual time=0.024..390.070 rows=2000002 loops=1)          ->  Function Scan on unnest idx  (cost=0.00..0.02 rows=2 width=4) (actual time=0.005..0.007 rows=2 loops=1)          ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.012..82.584 rows=1000001 loops=2)  Planning Time: 0.073 ms  Execution Time: 1254.362 ms 

I compared this to the case of only using 1 instance of each row rather than 2. The 1 instance query performs a single Seq Scan and runs ~50% faster (as expected):

postgres=> EXPLAIN ANALYZE SELECT postgres-> value, postgres-> COUNT(value) postgres-> FROM values postgres-> GROUP BY 1;                                                        QUERY PLAN -------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=21350.62..21352.62 rows=200 width=12) (actual time=444.381..662.952 rows=1000001 loops=1)    Group Key: value    ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.015..84.494 rows=1000001 loops=1)  Planning Time: 0.044 ms  Execution Time: 702.806 ms (5 rows) 

I want to scale this up to a much larger data set, so performance is critical. Are there any optimizations I cause my original query to run with only 1 Seq Scan? I’ve tried tweaking query plan settings (enable_nestloop, work_mem, etc)

Other Attempts

Here are some other approachs I tried:

  1. Using UNION still performs 2 Seq Scans:
SELECT    value,    COUNT(value) FROM (   SELECT value FROM values UNION   SELECT abs(value) AS value FROM values ) tbl GROUP BY 1; 
postgres=> EXPLAIN ANALYZE ...                                                                   QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=130150.31..130152.31 rows=200 width=12) (actual time=1402.221..1513.000 rows=1000001 loops=1)    Group Key: "values".value    ->  HashAggregate  (cost=73731.56..96299.06 rows=2256750 width=4) (actual time=892.904..1112.867 rows=1000001 loops=1)          Group Key: "values".value          ->  Append  (cost=0.00..68089.69 rows=2256750 width=4) (actual time=0.025..343.921 rows=2000002 loops=1)                ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.024..86.299 rows=1000001 loops=1)                ->  Seq Scan on "values" values_1  (cost=0.00..18529.69 rows=1128375 width=4) (actual time=0.013..110.885 rows=1000001 loops=1)  Planning Time: 0.067 ms  Execution Time: 1598.531 ms 
  1. Using PL/PGSQL. This performs only 1 Seq Scan, but ARRAY operations in PL/PGSQL are very slow, so this actual executes slower than the original:
CREATE TEMP TABLE result (value int, count int); DO LANGUAGE PLPGSQL $  $     DECLARE     counts int8[];     row record;   BEGIN      counts = array_fill(0, ARRAY[500000]);     FOR row IN (SELECT value FROM values) LOOP       counts[row.value] = counts[row.value] + 1;       counts[abs(row.value)] = counts[abs(row.value)] + 1;     END LOOP;      FOR i IN 0..500000 LOOP       CONTINUE WHEN counts[i] = 0;       INSERT INTO result (value, count) VALUES (i, counts[i]);     END LOOP;   END $  $  ; SELECT value, count FROM result; 
postgres=> \timing Timing is on. postgres=> DO LANGUAGE PLPGSQL $  $   ... DO Time: 2768.611 ms (00:02.769) 
  1. Tweaking Query Plan Configuration. I tried changing enable_seqscan, enable_nestloop, work_mem, and cost constraints and could not find a configuration that performed better than original.

Cluster or shared collections in logical aggregate groups

We would have date specific trading data for about 10M trades – each day’s data comprising ~1T data in different collections like market,trade,settlements etc.
Since we would not need more than 3 days data at any point of time – intention is to delete all data pertaining to T-3 or more.
There would be 2 possible options:

1.Each trading  day being represented as a separate database  - with the standard set of collections viz marketdata,trade,settlementdata in each database 2.Every collection being appended with the date viz marketdata_10032019,marketdata_10042019 etc  

The first option seems better as:

1. much cleaner in terms of maintenance - just dropping the obsolete databases,rather than scanning collections by name.  2. Dynamic collection names as in second option puts severe restrictions in aggregation possibilities - mongo aggregation does not support dynamic names.  

Would appreciate further views – based on performance,concurrency,scalability,clustering/sharding,maintenance etc

I do not see any question that addresses the used case of trying to cluster/maintain on the basis of a second-level logically aggregated grouping of collections based on date (or any other)

DDD – how to model an aggregate using data from 2 other aggregates to make a business decision

I’m stumbling trying to find a proper way to model this scenario: I have 3 different aggregates within same Bounded Context:

  1. A Student
  2. A University
  3. A University of Interest

    public class Student : Entity, IAggregateRoot {     public string Name { get; }     public string StateAbbreviaiton { get; }     ... }  public class University : Entity, IAggregateRoot {     public string Name { get; }     public string StateAbbreviaiton { get; }     ... }  public class UniveristyOfInterest : Entity, IAggregateRoot {     public Guid StudentId { get; }     public Guid UniversityId { get; }     public ResidencyType ResidencyType { get; }     ... } 

A UniveristyOfInterest is an entity that happens when a student selects a University they are interested in attending. UniveristyOfInterest is an Entity because it will ultimately contain a lot more information about the experience the user could have with the University including financial data, ROI calculations, etc. Each UniveristyOfInterest for a Student will be saved in some repository.

UniveristyOfInterest has an Enumeration called ResidencyType. ResidencyType has 3 possible values: InState, OutState, and Unknown. The business rule is if StateAbbreviation value of Student is same as StateAbbreviation value of University, then ResidencyType is InState, otherwise OutState (assuming we have valid values for both Student and University).

The UniversityOfInterest aggregate must contain the business rules for determining ResidencyType. All of the research I’ve done recommends Aggregates only know of other aggregates based on the aggregates Id value (no references to foreign aggregates). My UniversityOfInterest constructor is passed StudentId and UniversityId. How do I reach back and get their respective StateAbbreviation values so I can properly administer the business rule for determining ResidencyType inside the UniversityOfInterest aggregate?

I thought about also passing stateAbbreviation for both Student and University in constructor of UniversityOfInterest, but that seems klunky.

Any suggestions on how to properly administer business rule determining ResidencyType requiring data from foreign aggregates within the same Bounded Context?

DDD – operation on one aggregate that creates another aggregate

Let’s say I am designing a TODO application and therefore have an aggregate root called Task. The business requires to keep a list of TaskLogEvent that provides them with a history of how the task changed over time. As the Task may have hundreds of these events, I will model this TaskLogEvent as a separate aggregate root (I do not want to load these elements every time and I am not using any lazy loading mechanism).

Now anytime I want to do task.complete() or any other opration modifying the Task I want to create a new TaskLogEvent. I have these options:

  1. Design a domain service that will make all changes to Task and create the event. Every comunication with Task would have to get through this service.
  2. Pass TaskLogEventRepository to any method in Task so that the Task itself can create the Event and save it into the repository.
  3. Let an application service handle this. I don’t think that is a good idea.

What would be the ideal solution to this situation? Where did I make mistake in my thinking process?

How to split Aggregate Root that have equal identity?

In DDD, creating smaller aggregates is encouraged where possible, but in a system we are designing we are running into an issue regarding the identity of aggregates.

The system evolves around phyisical events, like music festivals or sports events, and within these events there are separated bounded contexts representing some business use case. For example, there can be Accreditation and Badges.

Solution One

In a first design of the system we had modeled an Event aggregate (see Figure 1), where all business logic around Accreditation, Badges, and future business use cases would live.

enter image description here

Figure 1: Event aggregate

Solution Two

However, this would result in a huge aggregate, so we decided it would be best to split these up into smaller aggregates (see Figure 2). Events itself was no longer a concept because we had split these up into separate Accreditation and Badges. However, because they all refer to the same physical event, they only identifier we could come up with was the same eventId.

Implementing this with event sourcing would also raise the issue that there are multiple event streams with the same identifier.

enter image description here

Figure 2: Split aggregates with shared identity.

Solution Three

Another solution would be the “natural” DDD approach where we would tread the different modules as Entities with their own identity (see Figure 3). This however, feels very unnatural and does not represent the actual domain logic. In the implementation we would therefor also need a lookup table of some sort to map the eventId to the required moduleId (see Figure 4).

enter image description here

Figure 3: Split aggregates with own identity.

enter image description here

Figure 4: Lookup table that maps the event to their modules.

Question

The question in this case is which of the following solution seems the most efficient and DDD-like approach?

Should I have an interface or class for my aggregate root?

Please see the code here and specifically

using System.Collections.Generic;    namespace DddInPractice.Logic.Common   {       public abstract class AggregateRoot      {           private readonly List<IDomainEvent> _domainEvents = new List<IDomainEvent>();           public virtual IReadOnlyList<IDomainEvent> DomainEvents => _domainEvents;             protected virtual void AddDomainEvent(IDomainEvent newEvent)           {               _domainEvents.Add(newEvent);           }             public virtual void ClearEvents()           {               _domainEvents.Clear();           }       }   }  

I am debating whether to use this class in my project. I like the idea because it encapsulates Domain Events. If I use it then all Aggregate Root classes will derive from it. Alternatively, I could use a marker interface like this (which all Aggregate Roots will implement):

public interface IAggregateRoot { } 

Should I:

  1. Create the base class or

  2. Create the interface or

  3. Do neither?

I like the idea of marking my Aggregate Roots.

Creation of aggregate root that contains references to other aggregate roots

I would like to model the following entities: “Person”, “Company” and the aggregate that ties the two together “Membership”. I have identified that Person and Company are aggregate roots. Thus, “Membership” would hold a reference (id) of both those aggregate roots, plus other value objects/entities that a membership holds (for example the title that the person has for that company).

In my architecture I have the following layers: Api controllers, services, domains, repository. When creating a Membership object, the controller receives two identifiers (for person and for company). Currently the service is responsible for making a call to the CompanyService – to ensure that a company with that id exists and to the PersonService – to ensure a person with that id exists. However, in the Domain model I currently have a constructor that takes in two ids, which makes it feel really anemic. Also, in a further iteration, there will be the addition of a list of references to a third aggregate root, Vehicle. Thus a vehicle can exist by itself, or it can also belong to a membership.

Is this a bad way of modelling these entities? Is there a better way? I have read about the notion of domain services and application services, but my application does not currently have that distinction and I don’t know if that concept would help in this case.

Even the behaviour of the aggregate roots feels a bit dry when it comes to functionality related to the other aggregate roots it holds references for: ie. the Membership domain model would have the ability to “Link a car”, setting a car for itself, but again, receiving just an identifier that it would add to a list of identifiers.