Discussion Forum : Understanding the problem.

Discussion Forum : Understanding the problem.

Observer design pattern is a very important design pattern used extensively mostly in the communication and media industry. It has wide amount of applications in the software industry was well.

Let us understand this pattern using the following example.

I am sure most of us are aware of discussion forums and would have been a part of some forums at some point of time and might be aware of the functioning of the same.

Let me quickly explain the same .A discussion forum is a platform which allows like minded people to interact with each other over certain topics of interest.

Lets talk about a discussion forum like yahoo groups.

What do we think should be the pre requisites for anyone to be an active participant of the yahoo groups discussion forum?

The first prerequisite is that any participant should have a valid email id.

Why is it a prerequisite for the participant to have a valid email id?

This is because Yahoo groups is a email based discussion forum wherein the participants are notified about the changes happening on the subject using email id as a reference and email as the communication media. Ie the participant should provide an interface using which the system can notify them about the changes happening on the system.

Once a person has a vlid email id, we can visit the website of yahoo groups and browse through thousands of topics available on the forum. For every topic he finds interesting, he can subscribe to one or more topics depending on his interest. Once he subscribes to one or more topic, the next time anything changes on any of the subscribed topics all the participants associated to these topics are notified by sending emails related to those topics. If the participant has to propagate his thoughts, he can reply to the email and his thoughts will be propagated to all the subscribers associated to that topic and this is how the discussion continues. This will continue to happen till you unsubscribe to a particular topic. Once a person unsubscribes, he wont get any further messages on that subject.

So donít we think that discussion forum looks to be a very simple system.

Now since we know about discussion forums, let me give you a task of designing a discussion forum of the scale of yahoo groups?

What do you think Ö would be the most difficult challenge in designing a discussion forum of the scale of yahoo groups?

Let me share a clue with you to help understand the problem.

A discussion forum of the scale of yahoo groups can have N number of topics and M number of subscribers and these M number of subscribers can get associated to one or more of these N number of topics and these M and N can vary by a large number over a period of time.

For example discussion forums of the scale of yahoo groups can have millions of topics and millions and billions of subscribers and for topics like soccer world cup Ö there be billions of subscribers associated to that topic and the number of subscribers can vary from 0 to 100 to 100000 to 10000000 to 100000000000 .

Now can we think about the most difficult challenge in designing a system like this?

Many a time my participants talk about various problems ranging from sending millions of emails to storing the same in some repository but most of these problems can be resolved either by throwing extra hardware or by throwing extra money..

When I ask you to design a discussion forum of the scale of yahoo groups, its pretty implicit that we will definitely be having a server farm with at least 200 servers within the same. So let us not think about problems which can be solved either by throwing extra hardware or extra money, lets think about a problem which is a logical problem which cannot be solved by pumping in extra cash or extra hardware.

To understand the exact problem we need to model the problem. The following figure shows a graphical view of a small set of people involved in a discussion forum discussing different UI Development frameworks like JSPís , ASPís, Flex, Microsoft Silverlight, SAP WebDynPro, MFC, AJAX etc ..

Figure- Figure

Different color codes graphically depicts the group of people interested in discussing various subjects. As can be seen these set of subscribers can be a part of one or more groups and the number of groups can be numerous.

Now to understand the problem, let us take out the set of subscribers who would want to discuss JSPs and the following figure describes the same.

Figure- Figure

As can be seen in the figure, around 20 participants are supposed to discuss about a topic called JSPs.

Discussion means the thought of one participants should be propagated to all the 20 participants which means that every participant will have to keep a reference to the other 19 participants. Think about it technically we have a system wherein there are two components who are directly communicating with each other. As you can see the figure looks very chaotic and second if the 21 members joins this topic all the other members will get impacted or in a nutshell any structural or behavioral change will have maximum impact in the system.

So how do we resolve this problem?

We all know we need a mediator to address these issues as we have a system wherein there are large number of structural components directly interacting with each other.

Hence for every topic in the discussion forum we will need a mediator for each one of the same but let us understand this is just a part of the problem and the solution. We still have to understand other aspects of the problem and its solution. The following figure shows a mediator for every topic

Figure- Figure

So now let us find out the other problem. We know mediators are bidirectional and is represented by a simple line with arrow heads on both the sides. Bidirectional communication can also be represented with two lines with arrows on both the lines, the way it is shown in the next figure

Figure- Figure

The figure above can help us understand the exact problem. Although we know that for every topic we need a mediator, but while discussing about the mediator design pattern I had mentioned that a mediator should not mediate more the 10 colleagues otherwise the mediator itself will become very complex. Let us understand here the number of colleagues can vary from 0 to billions, so think about it how complex will be our mediator?

This problem needs to be understood in parts, bidirectional communication means, the colleague keeping a reference to the mediator and the mediator keeping a reference to the colleagues.

Now let us understand is very easy for millions of mediators to know about a single mediator and keep a reference to it while it is extremely difficult for 1 mediator to know and keep a reference to millions of colleagues. The figure below shows the link with a OK tick is not a problem while the link with a cross is a big problem.

Figure- Figure

If the mediator has to communicate to millions of colleagues it has to keep a reference to them which is extremely difficult.

I can share a quick analogy,

During my training programs I start by introducing myself first by talking about my name and credentials and then I ask for participants introductions again asking for their name. Now let us understand if you ask any of my participants about my name they all will know it but if you ask me the name of the participants I wont remember them even if they had told me the same.

How is this possible?

The same reasoning , twenty people knowing a single name is not a problem but if one person has to know about 20 names Ö. it is a big problem and hence I may not remember their names .Although I donít remember their names but is it that I donít interact with them? In that case how can I train those participants?

To understand this, we need to understand the concept of explicit referencing and implicit referencing.

Explicit referencing means I know a particular student by his name and his current state and thatís how I train him as such. But there is a limit to which I can explicitly know individual participants , may be one or two or three .. not more than that.. isnít it?

Then how does this work? To make it work we need to use the concept of implicit referencing.

In my classroom although I donít know my participants explicitly but I know each of them is out here to learn design patterns and willing to learn design patterns [ie they have a common interface]. Once I presume they all have the same interface I can interact with all of them teaching them design patterns without even knowing each one of them explicitly.

Whenever we are supposed to interact with a large number of objects , it is not possible to know each of them explicitly , in that case we can interact with all of them implicitly if they all have a common interface.

This is the principle used with in discussion forums to solve the problem of mediator keeping a reference to millions of colleagues.

Hemant Jha
Founder - VPlanSolutions
Researcher, Trainer